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osmotic stress (Albertyn et al., Mol. Cell. Biol. 14. 4135-4144, (1994)). Earlier 
this century commercial glycerol production was achieved by the use of 
Saccharomyces cultures to which "steering reagents" were added such as sulfites 
or alkalis. Through the formation of an inactive complex, the steering agents 
5 block or inhibit the conversion of acetaldehyde to ethanol; thus, excess reducing 
equivalents (N ADH) are available to or ^00^' towards DHAP for reduction to 
produce glycerol. This method is limited by the partial inhibition of yeast growth 
that is due to the sulfites. This limitation can be partially overcome by the use of 
alkalis which create excess NADH equivalents by a different mechanism. In this 

10 practice, the alkalis initiated a Cannizarro disproportionation to yield ethanol and 
acetic acid from two equivalents of acetaldehyde. 

The gene encoding glycerol-3-phosphate dehydrogenase (DAR1, GPD1) 
has been cloned and sequenced from S. diastaticus (Wang et ah, J. Bad. 176, 
7091-7095, (1994)). The DAR1 gene was cloned into a shuttle vector and used to 

15 transform E. coli where expression produced active enzyme. Wang et al. (supra) 
recognize that DAR1 is regulated by the cellular osmotic environment but do not 
suggest how the gene might be used to enhance 1 ,3-propanediol production in a 
recombinant organism. 

Other glycerol-3-phosphate dehydrogenase enzymes have been isolated: 

20 for example, sn-glycerol-3-phosphate dehydrogenase has been cloned and 

sequenced from S. cerevisiae (Larason et al., Mol Microbiol. 10, 1101, (1993)) 
and Albertyn et al., (AM Cell. Biol 14, 4135, (1994)) teach the cloning of GPD1 
encoding a glyeerol-3-phosphate dehydrogenase from S. cerevisiae. Like Wang et 
al. (supra), both Albertyn et al. and Larason et al. recognize the osmo-sensitivity 

25 of the regulation of this gene but do not suggest how the gene might be used in the 
production of 1,3 -propanediol in a recombinant organism. 

As with G3PDH, glycerol-3-phosphatase has been isolated from 
Saccharomyces cerevisiae and the protein identified as being encoded by the 
GPP1 andGPP2 genes (Norbeck et al., J. Biol. Chem. 271, 13875,(1996)). Like 

30 the genes encoding G3PDH, it appears that GPP2 is osmosensitive. 

Although biological methods of both glycerol and K3-propanediol 
production are known, it has never been demonstrated that the entire process can 
be accomplished by a single recombinant organism. 

Neither the chemical nor biological methods described above for the 

35 production of 1 ,3-propanediol are well suited for industrial scale production since 
the chemical processes are energy intensive and the biological processes require 
the expensive starting material, glycerol. A method requiring low energy input 
and an inexpensive starting material is needed. A more desirable process would 
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incorporate a microorganism that would have the ability to convert basic carbon 
sources such as carbohydrates or sugars to the desired 1,3-propanediol 
end-product. 

Although a single organism conversion of fermentable carbon source other 
5 than glycerol or dihydroxyacetone to 1 ,3-propanediol would be desirable, it has 
been documented that there are significant difficulties to overcome in such an 
endeavor. For example, Gottschalk et al. (EP 373 230) teach that the growth of 
most strains useful for the production of 1,3-propanediol, including Citrobacter 
freundii, Clostridium autobutylicum, Clostridium butylicum, and Klebsiella 

10 pneumoniae, is disturbed by the presence of a hydrogen donor such as fructose or 
glucose. Strains of Lactobacillus brevis and Lactobacillus buchner, which 
produce 1 ,3-propanediol in co-fermentations of glycerol and fructose or glucose, 
do not grow when glycerol is provided as the sole carbon source, and, although it 
has been shown that resting cells can metabolize glucose or fructose, they do not 

15 produce 1,3 -propanediol. (Veiga DA Cunha et al., J. Bactehol 174, 1013 

(1992)). Similarly, it has been shown that a strain of Ilyobacter polytropus, which 
produces 1,3-propanediol when glycerol and acetate are provided, will not 
produce 1,3-propanediol from carbon substrates other than glycerol, including 
fructose and glucose. (Steib et uLArch. Microbiol. 140, 139 (1984)). Finally 

20 Tong et al. (Appl. Biochem. Biotech 34, 149 (1992)) has taught that recombinant 
Escherichia coli transformed with the dha regulon encoding glycerol dehydratase 
does not produce 1,3-propanediol from either glucose or xylose in the absence of 
exogenous glycerol. 

Attempts to improve the yield of 1,3-propanediol from glycerol have been 

25 reported where co-substrates capable of providing reducing equivalents, typically 
fermentable sugars, are included in the process. Improvements in yield have been 
claimed for resting cells of Citrobacter freundii and Klebsiella pneumoniae DSM 
4270 cofermenting glycerol and glucose (Gottschalk et al, supra., and Tran-Dinh 
et al., DE 3734 764); but not for growing cells of Klebsiella pneumoniae 

30 ATCC 25955 cofermenting glycerol and glucose, which produced no 

1 .3 -propanediol (1-T. Tong, Ph.D. Thesis. University of Wisconsin-Madison 
(1992)). Increased yields have been reported for the cofermentation of glycerol 
and glucose or fructose by a recombinant Escherichia coli; however, no 
L3-propanediol is produced in the absence of glycerol (Tong et al.. supra. ). In 

35 these systems, single organisms use the carbohydrate as a source of generating 

NADU while providing energy and carbon for cell maintenance or growth. These 
disclosures suggest that sugars do not enter the carbon stream that produces 
1.3-propanediol. In no case is 1,3-propanediol produced in the absence of an 
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exogenous source of glycerol. Thus the weight ofliterature clearly suggests that 
the production of 1,3-propanediol from a carbohydrate source by a single 
organism is not possible. 

The problem to be solved by the present invention is the biological 
5 production of 1,3-propanediol by a single recombinant organism from an 
inexpensive carbon substrate such as glucose or other sugars. The biological 
production of 1,3-propanediol requires glycerol as a substrate for a two step 
sequential reaction in which a dehydratase enzyme (typically a coenzyme 
B ^-dependent dehydratase) converts glycerol to an intermediate, 3-hydroxy- 

10 propionaldehyde, which is then reduced to 1,3-propanediol by a NADH- (or 

NADPH) dependent oxidoreductase. The complexity of the cofactor requirements 
necessitates the use of a whole cell catalyst for an industrial process which utilizes 
this reaction sequence for the production of 1,3-propanediol Furthermore, in 
order to make the process economically viable, a less expensive feedstock than 

15 glycerol or dihydroxyacetone is needed. Glucose and other carbohydrates are 
suitable substrates, but, as discussed above, are known to interfere with 
1,3-propanediol production. As a result no single organism has been shown to 
convert glucose to 1,3-propanediol. 



20 provides for bioconverting a fermentable carbon source directly to 

1,3-propanediol using a single recombinant organism. Glucose is used as a model 
substrate and the byconversion is applicable to any existing microorganism. 
Microorganisms harboring the genes encoding glycerol-3-phosphate 
dehydrogenase (G3PDH), glycerol-3-phosphatase (G3P phosphatase), glycerol 

25 dehydratase (dhaB), and 1 ,3 -propanediol oxidoreductase (dhaT), are able to 
convert glucose and other sugars through the glycerol degradation pathway to 
1 ,3-propanediol with good yields and selectivities. Furthermore, the present 
invention may be generally applied to include any carbon substrate that is readily 
converted to 1) glycerol, 2) dihydroxyacetone, or 3) C 3 compounds at the 

30 oxidation state of glycerol (e.g., glycerol 3-phosphate) or 4) C3 compounds at the 
oxidation state of dihydroxyacetone (e.g., dihydroxyacetone phosphate or 
glyceraldehydc 3-phosphate). 



The present invention provides a method for the production of 
35 1.3-propanediol from a recombinant organism comprising: 

(i) transforming a suitable host organism with a transformation 
cassette comprising at least one of (a) a gene encoding a glycerol-3-phosphatc 
dehydrogenase activity; (b) a gene encoding a glycerol-3 phosphatase activity; 



Applicants have solved the stated problem and the present invention 



SUMMARY OF THE INVENTION 



5 




WO 98^2 1339 PCT7US97/20292 

(c) genes encoding a dehydratase activity: and (d) a gene encoding 
1,3-propanediol oxidoreductase activity, provided that if the transformation 
cassette comprises less than all the genes of (a)-(d), then the suitable host 
organism comprises endogenous genes whereby the resulting transformed host 
5 organism comprises at least one of each of genes (a)-(d); 

(ii ) culturing the transformed host organism under suitable conditions 
in the presence of at least one carbon source selected from the group consisting of 
monosaccharides, oligosaccharides, polysaccharides, or a one carbon substrate 
whereby 1,3-propanediol is produced; and 

1U (Hi) recovering the 1,3-propanediol. 

The invention further provides transformed hosts comprising expression 
cassettes capable of expressing glycerol-3-phosphate dehydrogenase, glycerols- 
phosphatase, glycerol dehydratase and 1,3-propanediol oxidoreductase activities 
for the production of 1,3-propanediol. 

15 The suitable host organism used in the method is selected from the group 

consisting of bacteria, yeast, and filamentous fungi. The suitable host organism is 
more particularly selected from the group of genera consisting oiCitrobacter, 
Enter obacter, Clostridium, Klebsiella, Aerobacter, Lactobacillus, Aspergillus, 
Saccharomyces, Schizosaccharomyces, Zygosaccharomyces, Pichia, 

20 Kluyveromyces, Candida. Hansenula, Debaryomyces, Mucor, Torulopsis, 
Methylobacter, Escherichia, Salmonella, Bacillus, Streptornyces and 
Pseudomonas. Most particularly, the suitable host organism is selected from the 
group consisting off. coli, Klebsiella spp., and Saccharomyces spp. Particular 
transformed host organisms used in the method are I) a Saccharomyces spp. 

25 transformed with a transformation cassette comprising the genes dhaBl, dhaB2. 
dhaB3, and dhaT, wherein the genes are stably integrated into the Saccharomyces 
spp. genome; and 2) a Klebsiella spp. transformed with a transformation cassette 
comprising the genes GPD1 and GPD2; 

The preferred carbon source of the invention is glucose. 

30 The method further uses the gene encoding a glycerol-3-phosphate 

dehydrogenase enzyme selected from the group consisting of genes corresponding 
to amino acid sequences given in SEQ ID NO: 1 1 , in SEQ ID NO: 12, and in SEQ 
ID NO: 13, the amino acid sequences encompassing amino acid substitutions, 
deletions or additions that do not alter the function of the glycerol-3-phosphate 

35 dehydrogenase enzyme. The method also uses the gene encoding a glyceroI-3- 

phosphatase enzyme selected from the group consisting of genes corresponding to 
amino acid sequences given in SEQ ID NO:33 and in SEQ ID NO: 17, the amino 
acid sequences encompassing amino acid substitutions, deletions or additions that 



6 




WO 98/21339 PCT/US97/20292 

do not alter the function of the glycerol-3-phosphatase enzyme. The method also 
uses the gene encoding a glycerol kinase enzyme that corresponds to an amino 
acid sequence given in SEQ ID NO: 1 8, the amino acid sequence encompassing 
amino acid substitutions, deletions or additions that do not alter the function of the 
5 glycerol kinase enzyme. The method also uses the genes encoding a dehydratase 
enzyme comprise dhaBl, dhaB2 and dhB3, the genes corresponding respectively 
to amino acid sequences given in SEQ ID NO:34, SEQ ID NO:35, and SEQ ID 
NO:36, the amino acid sequences encompassing amino acid substitutions, 
deletions or additions that do not alter the function of the dehydratase enzyme. 

10 The method also uses the gene encoding a 1,3-propanediol oxidoreductase enzyme 
that corresponds to an amino acid sequence given in SEQ ID NO;37, the amino 
acid sequence encompassing amino acid substitutions, deletions or additions that 
do not alter the function of the 1,3-propanediol oxidoreductase enzyme. 

The invention is also embodied in a transformed host cell comprising; 

15 (a) a group of genes comprising 

( 1 ) a gene encoding a glycerol-3-phosphate dehydrogenase 
enzyme corresponding to the amino acid sequence given in SEQ ID NOT 1 ; 

(2) a gene encoding a glycerol-3 -phosphatase enzyme 
corresponding to the amino acid sequence given in SEQ ID NO: 17; 

20 ( 3') a gene encoding the a subunit of the glycerol dehydratase 

enzyme corresponding to the amino acid sequence given in SEQ ID NO:34; 

(4) a gene encoding the p subunit of the glycerol dehydratase 
enzyme corresponding to the amino acid sequence given in SEQ ID NO:35; 

(5) a gene encoding the 7 subunit of the glycerol dehydratase 
25 enzyme corresponding to the amino acid sequence given in SEQ ID NO:36; and 

(6) a gene encoding the 1,3-propanediol oxidoreductase enzyme 
corresponding to the amino acid sequence given in SEQ ID NO:37, 

the respective amino acid sequences of (a)(l)-(6) encompassing amino acid 
substitutions, deletions, or additions that do not alter the function of the enzymes 
30 of genes (l)-(6), and 

(b) a host cell transformed with the group of genes of (a), whereby 
the transformed host cell produces 1,3-propanediol on at least one substrate 
selected from the group consisting of monosaccharides, oligosaccharides, and 
polysaccharides or from a one-carbon substrate. 
35 BRIEF DESCRIPTION OF BIOLOGICAL 

DEPOSITS AND SEQUENCE LISTING 
The transformed E. coli W2042 (comprising the E coli host W14X5 and 
plasmids pDT20 and pAH42) containing the genes encoding glycerol-3-phosphate 
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dehydrogenase (G3PDH) and glycerol-3-phosphatase (G3P phosphatase), glycerol 
dehydratase (dhaB), and 1 ,3-propanediol oxidoreductase (dhaT) was deposited on 
26 September 1996 with the ATCC under the terms of the Budapest Treaty on the 
International Recognition of the Deposit of Micro-organisms for the Purpose of 
5 Patent Procedure and is designated as ATCC 98 1 88. 

S. cerevisiae YPH500 harboring plasmids pMCKlO, pMCK17, pMCK30 
and pMCK35 containing genes encoding glycerol-3-phosphate dehydrogenase 
(G3PDH) and glycerol-3-phosphatase (G3P phosphatase), glycerol dehydratase 
{dhaB), and 1 ,3-propanediol oxidoreductase (dhaT) was deposited on 
10 26 September 1996 with the ATCC under the terms of the Budapest Treaty on the 
International Recognition of the Deposit of Micro-organisms for the Purpose of 
Patent Procedure and is designated as ATCC 74392. 

"ATCC" refers to the American Type Culture Collection international 
depository located at 12301 Parklawn Drive. Rockville, MD 20852 U.S.A. The 
15 designations refer to the accession number of the deposited material. 

Applicants have provided 49 sequences in conformity with Rules for the 
Standard Representation of Nucleotide and Amino Acid Sequences in Patent 
Applications (Annexes I and II to the Decision of the President of the EPO, 
published in Supplement No. 2 to OJ EPO. 12/1992) and with 37 C.F.R. 
20 1.821-1.825 and Appendices A and B (Requirements for Application Disclosures 
Containing Nucleotides and/or Amino Acid Sequences). 

DETAILED DESCRIPTION OK THE INVENTION 
The present invention provides a method for a biological production of 
1,3 -propanediol from a fermentable carbon source in a single recombinant 
25 organism. The method incorporates a microorganism containing genes encoding 
glycerol-3-phosphate dehydrogenase (G3PDH), glycerol-3-phosphatase (G3P 
phosphatase), glycerol dehydratase (dhaB), and 1 ,3-propanediol oxidoreductase 
{dhaT). The recombinant microorganism is contacted with a carbon substrate and 
1.3 -propanediol is isolated from the growth media. 
30 The present method provides a rapid, inexpensive and environmentally 

responsible source of 1,3-propanediol monomer useful in the production of 
polyesters and other polymers. 

The following definitions are to be used to interpret the claims and 
specification. 

35 The terms "glycerol dehydratase" or "dehydratase enzyme" refer to the 

polypeptide(s) responsible for an enzyme activity that is capable of isomcrizing or 
converting a glycerol molecule to the product 3-hydroxypropionaldchydc. For the 
purposes of the present invention the dehydratase enzymes include a glycerol 
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dehydratase (GenBank U09771, U30903) and a diol dehydratase (GenBank 
D45071) having preferred substrates of glycerol and 1,2-propanediol, respectively. 
Glycerol dehydratase of K. pneumoniae ATCC 25955 is encoded by the genes 
dhaBL dhaB2 y and dhaB3 identified as SEQ ID NOS:l, 2 and 3, respectively. 
5 The dhaBl, dhaB2, and dhaB3 genes code for the a, f> y and y subunits of the 
glycerol dehydratase enzyme, respectively. 

The terms "oxidoreductase" or "1,3-propanediol oxidoreductase" refer to 
the polypeptide(s) responsible for an enzyme activity that is capable of catalyzing 
the reduction of 3-hydroxypropionaldehyde to 1,3 -propanediol. 1 ,3 -Propanediol 

10 oxidoreductase includes, for example, the polypeptide encoded by the dhaT gene 
(GenBank U09771, U30903) and is identified as SEQ ID NO:4. 

The terms <k glycerol-3 -phosphate dehydrogenase" or "G3PDH" refer to the 
polypeptide(s) responsible for an enzyme activity capable of catalyzing the 
conversion of dihydroxyacetone phosphate (DHAP) to glycerol-3-phosphate 

15 (G3P). In vivo G3PDH may be NADH-, NADPH-, or FAD-dependent. Examples 
of this enzyme activity include the following: NADH-dependent enzymes 
( EC 1.1.1.8) are encoded by several genes including GPD1 (GenBank Z7407 1x2) 
or GPD2 (GenBank Z35 169x1) or GPD3 (GenBank G984182) or DAR1 
(GenBank Z7407 1 x2); a NADPH-dependent enzyme (EC 1 . 1 . 1 .94) is encoded by 

20 gpsA (GenBank U32164, G466746 (cds 19791 1-196892), and L45246); and 
FAD-dependent enzymes (EC 1 .1 .99.5) are encoded by GUT2 (GenBank 
Z47047x23) or glpD (GenBank G 147838) or glpABC (GenBank M20938). 

The terms "glycerol-3-phosphatase" or "sn-glycerol-3-phosphatase" or 
"d.l-glycerol phosphatase' 1 or "G3P phosphatase" refer to the polypeptide(s) 

25 responsible for an enzyme activity that is capable of catalyzing the conversion of 
glycerol-3-phosphate to glycerol. G3P phosphatase includes, for example, the 
polypeptides encoded by GPP1 (GenBank Z47047xl25) or GPP2 (GenBank 
1118813x11). 

The term "glycerol kinase" refers to the polypeptide(s) responsible for an 
30 enzy me activity capable of catalyzing the conversion of glycerol to glycerol-3- 
phosphate or glycerol-3 -phosphate to glycerol, depending on reaction conditions. 
Glycerol kinase includes, for example, the polypeptide encoded by GUT1 
(GenBank Ul 1583x19). 

The terms "GPDP. "DART, "OSGl v . "D2830", and "YDL022W" will 
35 be used interchangeably and refer to a gene that encodes a cytosolic glycerol-3- 
phosphate dehydrogenase and characterized by the base sequence given as SEQ 
IDNO:5. 
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The term "GPD2" refers to a gene that encodes a cytosolic glycero- 
phosphate dehydrogenase and characterized by the base sequence given as SEQ 
[DNO:6. 

The terms "GUT2" and "YIL155C" are used interchangably and refer to a 
5 gene that encodes a mitochondrial glyeerol-3-phosphate dehydrogenase and 
characterized by the base sequence given in SEQ ID NO:7. 

The terms "GPP1", "RHR2" and "YILOSSW" are used interchangably and 
refer to a gene that encodes a cytosolic glycerol-3 -phosphatase and characterized 
by the base sequence given as SEQ ID NO:8. 
10 The terms "GPP2", "HOR2" and "YER062C" are used interchangably and 

refer to a gene that encodes a cytosolic glycerol-3 -phosphatase and characterized 
by the base sequence given as SEQ ID NO:9. 

The term "GUT1" refers to a gene that encodes a cytosolic glycerol kinase 
and characterized by the base sequence given as SEQ ID NO: 10. 
15 The terms "function" or "enzyme function'' refer to the catalytic activity of 

an enzyme in altering the energy required to perform a specific chemical reaction. 
It is understood that such an activity may apply to a reaction in equilibrium where 
the production of either product or substrate may be accomplished under suitable 
conditions. 

20 The terms "polypeptide" and "protein" arc used interchangeably. 

The terms "carbon substrate" and "carbon source" refer to a carbon source 
capable of being metabolized by host organisms of the present invention and 
particularly carbon sources selected from the group consisting of 
monosaccharides, oligosaccharides, polysaccharides, and one-carbon substrates or 
25 mixtures thereof. 

The terms "host cell" or "host organism" refer to a microorganism capable 
of receiving foreign or heterologous genes and of expressing those genes to 
produce an active gene product. 

The terms "foreign gene", "foreign DNA", "heterologous gene" and 
30 "heterologous DNA" refer to genetic material native to one organism that has 
been placed within a host organism by various means. 

The terms "recombinant organism" and "transformed host" refer to any 
organism having been transformed with heterologous or foreign genes. The 
recombinant organisms of the present invention express foreign genes encoding 
35 glycerol-3-phosphate dehydrogenase (G3PDII) and glycerol-3-phosphatase (G3P 
phosphatase), glycerol dehydratase {dhaB), and 1 .3-propanediol oxidoreductase 
(dhuT) for the production of 1 .3-propanediol from suitable carbon substrates. 
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"Gene" refers to a nucleic acid fragment that expresses a specific protein, 
including regulatory sequences preceding (5' non-coding) and following (3' non- 
coding) the coding region. The terms "native" and "wild-type" refer to a gene as 
found in nature with its own regulatory sequences. 



through the mechanisms of transcription and translation, produces an amino acid 
sequence. It is understood that the process of encoding a specific amino acid 
sequence includes DNA sequences that may involve base changes that do not 
cause a change in the encoded amino acid, or which involve base changes which 

10 may alter one or more amino acids, but do not affect the functional properties of 
the protein encoded by the DNA sequence. It is therefore understood that the 
invention encompasses more than the specific exemplary sequences. 
Modifications to the sequence, such as deletions, insertions, or substitutions in the 
sequence which produce silent changes that do not substantially affect the 

15 functional properties of the resulting protein molecule are also contemplated. For 
example, alteration in the gene sequence which reflect the degeneracy of the 
genetic code, or which result in the production of a chemically equivalent amino 
acid at a given site, are contemplated. Thus, a codon for the amino acid alanine, a 
hydrophobic amino acid, may be substituted by a codon encoding another less 

20 hydrophobic residue, such as glycine, or a more hydrophobic residue, such as 
valine, leucine, or isoleucine. Similarly, changes which result in substitution of 
one negatively charged residue for another, such as aspartic acid for glutamic acid, 
or one positively charged residue for another, such as lysine for arginine, can also 
be expected to produce a biologically equivalent product. Nucleotide changes 

25 which result in alteration of the N-tenninal and C-terminal portions of the protein 
molecule would also not be expected to alter the activity of the protein. In some 
cases, it may in fact be desirable to make mutants of the sequence in order to study 
the effect of alteration on the biological activity of the protein. Each of the 
proposed modifications is well within the routine skill in the art, as is 

30 determination of retention of biological activity in the encoded products. 

Moreover, the skilled artisan recognizes that sequences encompassed by this 
invention are also defined by their ability to hybridize, under stnngent conditions 
(0.1X SSC, 0.1% SDS, 65 C C), with the sequences exemplified herein. 



35 product from a gene coding for the sequence of the gene product. 

The terms "plasmid", "vector", and "cassette" refer to an extra 
chromosomal element often carrying genes which are not part of the central 
metabolism of the cell, and usually in the form of circular double-stranded DNA 



5 



The terms "encoding" and "coding" refer tc the process by which a gene, 



The term "expression" refers to the transcription and translation to gene 
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molecules. Such elements may be autonomously replicating sequences, genome 
integrating sequences, phage or nucleotide sequences, linear or circular, of a 
single- or double-stranded DNA or RNA, derived from any source, in which a 
number of nucleotide sequences have been joined or recombined into a unique 
5 construction which is capable of introducing a promoter fragment and DNA 
sequence for a selected gene product along with appropriate 3' untranslated 
sequence into a cell. 'Transformation cassette" refers to a specific vector 
containing a foreign gene and having elements in addition to the foreign gene that 
facilitate transformation of a particular host cell. ''Expression cassette" refers to a 

10 specific vector containing a foreign gene and having elements in addition to the 
foreign gene that allow for enhanced expression of that gene in a foreign host. 

The terms "transformation" and "transfection" refer to the acquisition of 
new genes in a cell after the incorporation of nucleic acid. The acquired genes 
may be integrated into chromosomal DNA or introduced as extrachromosomal 

15 replicating sequences. The term "transformant" refers to the product of a 
transformation. 

The term ''genetically altered" refers to the process of changing hereditary 
material by transformation or mutation. 
CONSTRUCTION OF RECOMBINANT ORGANISMS : 

20 Recombinant organisms containing the necessary genes that will encode 

the enzymatic pathway for the conversion of a carbon substrate to 1,3-propanediol 
may be constructed using techniques well known in the art. In the present 
invention genes encoding glycerol-3-phosphate dehydrogenase (G3PDH). 
glycerol-3-phosphatase (G3P phosphatase), glycerol dehydratase (dhaB), and 

25 1,3-propanediol oxidoreductase (dhaT) were isolated from a native host such as 
Klebsiella or Sac char amy ces and used to transform host strains such as E. coli 
DH5a, ECL707, AA200, or W1485; the Saccharomocyes cerevisiae strain 
YPH500: or the Klebsiella pneumoniae strains ATCC 25955 or ECL 2106. 
Isolation of Genes 

30 Methods of obtaining desired genes from a bacterial genome are common 

and well known in the art of molecular biology. For example, if the sequence of 
the gene is known, suitable genomic libraries may be created by restriction 
endonuclease digestion and may be screened with probes complementary to the 
desired gene sequence. Once the sequence is isolated, the DNA may be amplified 

35 using standard pnmer directed amplification methods such as polymerase chain 
reaction (PCR) (U.S. 4,683.202) to obtain amounts of DNA suitable for 
transformation using appropriate vectors. 
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Alternatively, cosmid libraries may be created where large segments of 
genomic DNA (35-45kb) may be packaged into vectors and used to transform 
appropriate hosts. Cosmid vectors are unique in being able to accommodate large 
quantities of DNA. Generally, cosmid vectors have at least one copy of the cos 
5 DNA sequence which is needed for packaging and subsequent circularization of 
the foreign DNA. In addition to the cos sequence these vectors will also contain 
an origin of replication such as ColEl and drug resistance markers such as a gene 
resistant to ampicillin or neomycin. Methods of using cosmid vectors for the 
transformation of suitable bacterial hosts are well described in Sambrook et al., 

10 Molecular Cloning: A Laboratory Manual , Second Edition (1989) Cold Spring 
Harbor Laboratory Press, Cold Spring Harbon, NY (1989). 

Typically to clone cosmids, foreign DNA is isolated and ligated, using the 
appropriate restriction endonucleases, adjacent to the cos region of the cosmid 
vector. Cosmid vectors containing the linearized foreign DNA is then reacted 

15 with a DNA packaging vehicle such as bacteriophage k. During the packaging 
process the cay sites are cleaved and the foreign DNA is packaged into the head 
portion of the bacterial viral particle. These particles are then used to transfect 
suitable host cells such as E. coli. Once injected into the cell, the foreign DNA 
circularizes under the influence of the cos sticky ends. In this manner large 

20 segments of foreign DNA can be introduced and expressed in recombinant host 
cells. 

Isolation and cloning of genes encoding glycerol dehydratase (dhaB) and 
13 -propanediol oxidoreductase jdhaT) 

Cosmid vectors and cosmid transformation methods were used within the 

25 context of the present invention to clone large segments of genomic DNA from 
bacterial genera known to possess genes capable of processing glycerol to 
1,3-propanediol. Specifically, genomic DNA from K. pneumoniae ATCC 25955 
was isolated by methods well known in the art and digested with the restriction 
enzyme Sau3 A for insertion into a cosmid vector Supercos 1 and packaged using 

30 Gigapackll packaging extracts. Following construction of the vector E. coli 

XL I -Blue MR cells were transformed with the cosmid DNA. Transformants were 
screened for the ability to convert glycerol to 1,3-propanediol by growing the cells 
in the presence of glycerol and analyzing the media for 1,3-propanediol formation. 
Two of the 1,3-propanediol positive transformants were analyzed and the 

35 cosmids were named pKPl and pKP2. DNA sequencing revealed extensive 
homology to the glycerol dehydratase gene (dhaB) from C. freundii. 
demonstrating that these transformants contained DNA encoding the glycerol 
dehydratase gene. Other 1,3-propanediol positive transformants were analyzed 



13 




WO 98/21339 PCT/US97/20292 

and the cosmids were named pKP4 and pKP5. DNA sequencing revealed that 
these cosmids carried DNA encoding a diol dehydratase gene. 

Although the instant invention utilizes the isolated genes from within a 
Klebsiella cosmid, alternate sources of dehydratase genes include, but are not 
5 limited to, Citrohacter, Clostridia, and Salmonella. 
Genes encoding G3PDH and G3P phosphatase 

The present invention provides genes suitable for the expression of 
G3PDH and G3P phosphatase activities in a host cell. 

Genes encoding G3PDH are known. For example, GPD1 has been 
10 isolated from Sac char omyces and has the base sequence given by SEQ ID NO:5, 
encoding the amino acid sequence given in SEQ ID NO:l 1 (Wang et al., supra). 
Similarly, G3PDH activity is has also been isolated from Saccharomyces encoded 
by GPD2 having the base sequence given in SEQ ID NO:6, encoding the amino 
acid sequence given in SEQ ID NO: 12 (Eriksson et al., Mol Microbiol 17, 95, 
15 (1995). 

It is contemplated that any gene encoding a polypeptide responsible for 
G3PDH activity is suitable for the purposes of the present invention wherein that 
activity is capable of catalyzing the conversion of dihydroxyacetone phosphate 
(DHAP) to glycerol-3-phosphate (G3P). Further, it is contemplated that any gene 

20 encoding the amino acid sequence of G3PDH as given by any one of SEQ ID 

NOS:ll, 12, 13, 14, 15 and 16 corresponding to the genes GPD1. GPD2, GUT2, 
gpsA, glpD, and the a subunit of glpABC, respectively, will be functional in the 
present invention wherein that amino acid sequence encompasses amino acid 
substitutions, deletions or additions that do not alter the function of the enzyme. It 

25 will be appreciated by the skilled person that genes encoding G3PDH isolated 
from other sources are also be suitable for use in the present invention. For 
example, genes isolated from prokaryotes include GenBank accessions M34393. 
M20938, L06231. U12567, L45246, L45323, L45324, L45325, U32164, and 
U39682; genes isolated from fungi include GenBank accessions U30625, U30876 

30 and X56162; genes isolated from insects include GenBank accessions X61223 and 
X 14 179; and genes isolated from mammalian sources include GenBank 
accessions U 12424, M25558 and X78593. 

Genes encoding G3P phosphatase are known. For example, GPP2 has 
been isolated from Saccharomyces cerevisiae and has the base sequence given by 

35 SEQ ID NO:9 which encodes the amino acid sequence given in SEQ ID NO: 1 7 
fNorbeck et al., J. Biol. Chem. 271. p. 13875, 1996). 

It is contemplated that any gene encoding a G3P phosphatase activity is 
suitable for the purposes of the present invention wherein that activity is capable 
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of catalyzing the conversion of glycerol-3-phosphate to glycerol. Further, it is 
contemplated that any gene encoding the amino acid sequence of G3P 
phosphatase as given by SEQ ID NOS:33 and 17 will be functional in the present 
invention wherein that amino acid sequence encompasses amino acid 
5 substitutions, deletions or additions that do not alter the function of the enzyme. It 
will be appreciated by the skilled person that genes encoding G3P phosphatase 
isolated from other sources are also suitable for use in the present invention. For 
example, the dephosphorylation of glycerol-3-phosphate to yield glycerol may be 
achieved with one or more of the following general or specific phosphatases: 

10 alkaline phosphatase (EC 3.1.3.1) [GenBank M19159, M29663, U02550 or 

M33965] ; acid phosphatase (EC 3.1.3.2) [GenBank U5 1 2 1 0, U 1 9789, U2865 8 or 
L20566]; glycerol-3-phosphatasc (EC 3.1.3.-) [GenBank Z38060 or Ul 881 3x1 1]; 
glucose- 1 -phosphatase (EC 3.1.3.10) [GenBank M33807]; glucose-6-phosphatase 
(EC 3.1.3.9) [GenBank U()0445|; fructose- 1,6-bisphosphatase (EC 3.1.3.1 1) 

15 [GenBank X12545 or J03207] or phosphotidyl glycero phosphate phosphatase 
(EC 3.1.3.27) [GenBank M23546 and M23628]. 

Genes encoding glycerol kinase are known. For example, GUT1 encoding 
the glycerol kinase from Saccharaomyces has been isolated and sequenced (Pavlik 
et al., Curr. Genet. 24, 21, (1993)) and the base sequence is given by SEQ ID 

20 NO: 1 0 which encodes the amino acid sequence given in SEQ ID NO: 1 8. It will 
be appreciated by the skilled artisan that although glycerol kinase catalyzes the 
degradation of glycerol in nature the same enzyme will be able to function in the 
synthesis of glycerol to convert glycerol-3-phosphate to glycerol under the 
appropriate reaction energy conditions. Evidence exists for glycerol production 

25 through a glycerol kinase. Under anaerobic or respiration-inhibited conditions, 
Trypanosoma brucei gives rise to glycerol in the presence of Glycerol-3-P and 
ADP. The reaction occurs in the glycosome compartment (D. Hammond, J. Biol. 
Chem. 260, 15646-15654,(1985)). 
Host cells 

30 Suitable host cells for the recombinant production of glycerol by the 

expression of G3PDH and G3P phosphatase may be either prokaryotic or 
eukaryotic and will be limited only by their ability to express active enzymes. 
Preferred hosts will be those typically useful for production of glycerol or 
1.3 -propanediol such as Citrobacter, Enterobacter, Clostridium, Klebsiella, 

35 Aerobacter. Lactobacillus. Aspergillus, Saccharomyces, Schiznsaccharomyces, 
Zygosaccharomyces. Pichia, Kluyveromyces, Candida. Hansenula, 
Debaryomyces, Xhicor, Torulopsis, Methylobacter, Escherichia. Salmonella, 
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Bacillus, Streptomyces and Pseudomonas. Most preferred in the present invention 
are E. coli, Klebsiella species and Saccharomyces species. 

Adenosyl-cobalamin (coenzyme B 12 ) is an essential cofactor for glycerol 
dehydratase activity. The coenzyme is the most complex non-polymeric natural 
5 product known, and its synthesis in vivo is directed using the products of about 30 
genes. Synthesis of coenzyme is found in prokaryotes. some of which are 
able to synthesize the compound de novo, while others can perform partial 
reactions. E. coli, for example, cannot fabricate the cornn ring structure, but is 
able to catalyse the conversion of cobinamide to corrinoid and can introduce the 

10 5'-deoxyadenosyl group. 

Eukaryotes are unable to synthesize coenzyme de novo and instead 
transport vitamin Bj2 from the extracellular milieu with subsequent conversion of 
the compound to its functional form of the compound by cellular enzymes. Three 
enzyme activities have been described for this series of reactions. 

15 1 ) aquacobalamin reductase (EC 1 .6.99.8) reduces Co(III) to Co(II); 

2) cob(II)alamin reductase (EC 1.6.99.9) reduces Co(II) to Co(I); and 

3) cob(I)alamin adenosyltransferase (EC 2.5.1.17) transfers a 5'deoxyadenosine 
moiety from ATP to the reduced corrinoid. This last enzyme activity is the best 
characterized of the three, and is encoded by cob A in S. typhimurium, btuR in 

20 E. coli and cobO in P. denithficans. These three cob(I )alamin adenosyltransferase 
genes have been cloned and sequenced. Cob(I)alamin adenosyltransferase activity 
has been detected in human fibroblasts and in isolated rat mitochondria (Fcnton ct 
al., Biochem. Biophys. Res. Commun 98, 283-9, (1981 )). The two enzymes 
involved in cobalt reduction are poorly characterized and gene sequences are not 

25 available. There are reports of an aquacobalamin reductase from Euglena gracilis 
(Watanabe et al., Arch. Biochem. Biophys. 305, 421-7. (1993)) and a microsomal 
cob(III)alamin reductase is present in the microsomal and mitochondrial inner 
membrane fractions from rat fibroblasts (Pezacka. Biochim. Biophys. Acta, 1 157. 
167-77,(1993)). 

30 Supplementing culture media with vitamin B|2 may satisfy the need to 

produce coenzyme B12 for glycerol dehydratase activity in many microorganisms, 
but in some cases additional catalytic activities may have to be added or increased 
in vivo. Enhanced synthesis of coenzyme B ]2 m eukaryotes may be particularly 
desirable. Given the published sequences for genes encoding cob(I)alamin 

35 adenosyltransferase, the cloning and expression of this gene could be 

accomplished by one skilled in the art. For example, it is contemplated that yeast, 
such as Saccharomyces, could be constructed so as to contain genes encoding 
cob(I)alamin adenosyltransferase in addition to the genes necessary to effect 
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conversion of a carbon substrate such as glucose to 1,3-propanediol. Cloning and 
expression of the genes for cobalt reduction requires a different approach. This 
could be based on a selection in E. coli for growth on ethanolamine as sole N 2 
source, in the presence of coenzyme B 12 ethanolamine ammonia-lyase enables 
5 growth of cells in the absence of other N 2 sources. If E. coli cells contain a cloned 
gene for cob(I)alamin adenosyltransferase and random cloned DNA from another 
organism, growth on ethanolamine in the presence of aquacobalamin should be 
enhanced and selected for if the random cloned DNA encodes cobalt reduction 
properties to facilitate adenosylation of aquacobalamin. 

10 In addition to E. coli and Saccharomyces, Klebsiella is a particularly 

preferred host. Strains of Klebsiella pneumoniae are known to produce 
1 ,3-propanediol when grown on glycerol as the sole carbon. It is contemplated 
that Klebsiella can be genetically altered to produce 1,3-propanediol from 
monosaccharides, oligosaccharides, polysaccharides, or one-carbon substrates. 

15 In order to engineer such strains, it will be advantageous to provide the 

Klebsiella host with the genes facilitating conversion of dihydroxyacetone 
phosphate to glycerol and conversion of glycerol to 1,3-propanediol either 
separately or together, under the transcriptional control of one or more constitutive 
or inducible promoters. The introduction of the DAR1 and GPP2 genes encoding 

20 glycerol-3-phosphate dehydrogenase and glycerol-3-phosphatase, respectively, 
will provide Klebsiella with genetic machinery to produce 1,3-propanediol from 
an appropriate carbon substrate. 

The genes (e.g.. G3PDH, G3F phosphatase, dhaB and/or dhal) may be 
introduced on any plasmid vector capable of replication in K. pneumoniae or they 

25 may be integrated into the K pneumoniae genome. For example, K. pneumoniae 
ATCC 25955 and K. pneumoniae ECL 2106 are known to be sensitive to 
tetracycline or chloramphenicol; thus plasmid vectors which are both capable of 
replicating in K. pneumoniae and encoding resistance to either or both of these 
antibiotics may be used to introduce these genes into K. pneumoniae. Methods of 

30 transforming Klebsiella with genes of interest are common and well known in the 
art and suitable protocols, including appropriate vectors and expression techniques 
may be found in Sambrook, supra. 
Vectors and expression cassettes 

The present invention provides a variety of vectors and transformation and 

35 expression cassettes suitable for the cloning, transformation and expression of 
G3PDH and G3P phosphatase into a suitable host cell. Suitable vectors will be 
those which are compatible with the bacterium employed. Suitable vectors can be 
derived, for example, from a bacteria, a virus (such as bacteriophage T7 or a M-l 3 
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derived phage), a cosmid, a yeast or a plant. Protocols for obtaining and using 
such vectors are known to those in the art. (Sambrook et al., Molecular Cloning: 
A Laboratory Manual - volumes 1,2,3 (Cold Spring Harbor Laboratory, Cold 
Spring Harbor, NY, (1989)). 
5 Typically, the vector or cassette contains sequences directing transcription 

and translation of the relevant gene, a selectable marker, and sequences allowing 
autonomous replication or chromosomal integration. Suitable vectors comprise a 
region 5' of the gene which harbors transcriptional initiation controls and a region 
3' of the DNA fragment which controls transcriptional termination. It is most 
10 preferred when both control regions are derived from genes homologous to the 

transformed host cell although it is to be understood that such control regions need 
not be derived from the genes native to the specific species chosen as a production 
host. 

Initiation control regions or promoters, which are useful to drive 
15 expression of the G3PDH and G3P phosphatase genes in the desired host cell, arc 
numerous and familiar to those skilled in the art. Virtually any promoter capable 
of driving these genes is suitable for the present invention including but not 
limited to CYCL HIS3, GAL 1, GAL 10, ADH1, PGK, PH05, GAPDH, ADC1, 
TRP1, URA3, LEU2, ENO, TPI (useful for expression in Saccharomyces)\ AOX1 
20 (useful for expression in Pichia); and lac, trp, XP L , XPr, T7, tac, and ire (useful 
for expression in E. call). 

Termination control regions may also be derived from various genes native 
to the preferred hosts. Optionally, a termination site may be unnecessary, 
however, it is most preferred if included. 
25 For effective expression of the instant enzymes, DNA encoding the 

enzymes are linked operably through initiation codons to selected expression 
control regions such that expression results in the formation of the appropriate 
messenger RNA. 

Transformation of suitable hosts and expression of genes for the 

30 production of 1,3 -propanediol 

Once suitable cassettes are constructed they are used to transform 
appropriate host cells. Introduction of the cassette containing the genes encoding 
glycerol-3 -phosphate dehydrogenase (G3PDH) and glyeerol-3-phosphatase (G3P 
phosphatase), glycerol dehydratase (dhaB\ and 1 .3 -propanediol oxidoreductase 

35 \dhaT). either separately or together into the host cell may be accomplished by 
known procedures such as by transformation (e.g.. using ealcium-permeabilized 
cells, electroporation) or by transfection using a recombinant phage virus. 
(Sambrook et al., supra.) 
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In the present invention, E. coli W2042 (ATCC 98188) containing the 
genes encoding glycerol-3-phosphate dehydrogenase (G3PDH) and glycerol-5- 
phosphatase (G3P phosphatase), glycerol dehydratase (dhaB), and 1,3-propanediol 
oxidoreductase (dhaT) was created. Additionally, S. cerevisiae YPH500 
5 (ATCC 74392) harboring plasmids pMCK10 : pMCK17, pMCK30 and pMCK35 
containing genes encoding glycerol-3-phosphate dehydrogenase (G3PDH) and 
glycerol-3-phosphatase (G3P phosphatase), glycerol dehydratase (dhaB), and 
1,3-propanediol oxidoreductase {dhaT) was constructed. Both the above- 
mentioned transformed E. coli and Saccharomyces represent preferred 

10 embodiments of the invention. 
Media and Carbon Substrates : 

Fermentation media in the present invention must contain suitable carbon 
substrates. Suitable substrates may include but are not limited to monosaccharides 
such as glucose and fructose, oligosaccharides such as lactose or sucrose, 

15 polysaccharides such as starch or cellulose, or mixtures thereof, and unpurified 
mixtures from renewable feedstocks such as cheese whey permeate, cornsteep 
liquor, sugar beet molasses, and barley malt. Additionally, the carbon substrate 
may also be one-carbon substrates such as carbon dioxide, or methanol for which 
metabolic conversion into key biochemical intermediates has been demonstrated. 

20 Glycerol production from single carbon sources (e.g., methanol, 

formaldehyde, or formate) has been reported in methylotrophic yeasts (Yamada et 
zLAgric. Biol. Chem., 53(2) 541-543,(1989)) and in bacteria (Hunter et.al., 
Biochemistry, 24, 4148-41 55, (1985)). These organisms can assimilate single 
carbon compounds, ranging in oxidation state from methane to formate, and 

25 produce glycerol. The pathway of carbon assimilation can be through ribulose 

monophosphate, through sennc, or through xylulose-momophosphatc (Gottschalk. 
Bacterial Metabolism , Second Edition, Springer-Verlag: New York (1986)). The 
ribulose monophosphate pathway involves the condensation of formate with 
ribulose-5 -phosphate to form a 6 carbon sugar that becomes fructose and 

30 eventually the three carbon product glyceraldehyde-3 -phosphate. Likewise, the 
serine pathway assimilates the one-carbon compound into the glycolytic pathway 
via methylenetetrahydrofolate. 

In addition to utilization of one and two carbon substrates, methylotrophic 
organisms arc also known to utilize a number of other carbon-containing 

35 compounds such as methylaminc, glucosamine and a variety of amino acids for 
metabolic activity. For example, methylotrophic yeast are known to utilize the 
carbon from methylamine to form trehalose or glycerol (Bellion et aL Microb. 
Growth CI Compd, [Int. Symp.], 7th (1993), 415-32. Editor(s): MurrelL J. 
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Collin; Kelly, Don P. Publisher: Intercept, Andover, UK). Similarly, various 
species of Candida will metabolize alanine or oleic acid (Suiter et al., Arch. 
Microbiol '., 153(5), 485-9 (1990)). Hence, the source of carbon utilized in the 
present invention may encompass a wide variety of carbon-containing substrates 
5 and will only be limited by the requirements of the host organism. 

Although it is contemplated that all of the above mentioned carbon 
substrates and mixtures thereof are suitable in the present invention, preferred 
carbon substrates are monosaccharides, oligosaccharides, polysaccharides, and 
one-carbon substrates. More preferred are sugars such as glucose, fructose, 

10 sucrose and single carbon substrates such as methanol and carbon dioxide. Most 
preferred is glucose. 

In addition to an appropriate carbon source, fermentation media must 
contain suitable minerals, salts, cofactors, buffers and other components, known to 
those skilled in the art, suitable for the growth of the cultures and promotion of the 

15 enzymatic pathway necessary for glycerol production. Particular attention is 
given to Co(II) salts and/or vitamin B 12 or precursors thereof. 
Culture Conditions : 

Typically, cells are grown at 30 °C in appropriate media. Preferred growth 
media in the present invention are common commercially prepared media such as 

20 Luria Bertani (LB) broth, Sabouraud Dextrose (SD) broth or Yeast Malt Extract 
(YM) broth. Other defined or synthetic growth media may also be used and the 
appropriate medium for growth of the particular microorganism will be known by 
someone skilled in the art of microbiology or fermentation science. The use of 
agents known to modulate catabolite repression directly or indirectly, e.g., cyclic 

25 adenosine 2':3'-monophosphate or cyclic adenosine 2' ^'-monophosphate, may 
also be incorporated into the reaction media. Similarly, the use of agents known 
to modulate enzymatic activities (e.g., sulphites, bisulphites and alkalis) that lead 
to enhancement of glycerol production may be used in conjunction with or as an 
alternative to genetic manipulations. 

30 Suitable pH ranges for the fermentation are between pH 5.0 to pH 9.0, 

where pH 6.0 to pH 8.0 is preferred as range for the the initial condition. 

Reactions may be performed under aerobic or anaerobic conditions where 
anaerobic or microaerobic conditions are preferred. 
Batch and Continuous Fermentations : 

35 The present process uses a batch method of fermentation. A classical 

batch fermentation is a closed system where the composition of the media is set at 
the beginning of the fermentation and not subject to artificial alterations dunng the 
fermentation. Thus, at the beginning of the fermentation the media is inoculated 
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with the desired organism or organisms and fermentation is permitted to occur 
adding nothing to the system. Typically, however, a batch fermentation is "batch" 
with respect to the addition of the carbon source and attempts are often made at 
controlling factors such as pH and oxygen concentration. The metabolite and 
5 biomass compositions of the batch system change constantly up to the time the 
fermentation is stopped. Within batch cultures cells moderate through a static lag 
phase to a high growth log phase and finally to a stationary phase where growth 
rate is diminished or halted. If untreated, cells in the stationary phase will 
eventually die. Cells in log phase generally are responsible for the bulk of 

1 0 production of end product or intermediate. 

A variation on the standard batch system is the Fed-Batch fermentation 
system which is also suitable in the present invention. In this variation of a 
typical batch system, the substrate is added in increments as the fermentation 
progresses. Fed-Batch systems are usefnl when catabolite repression is apt to 

15 inhibit the metabolism of the cells and where it is desirable to have limited 
amounts of substrate in the media. Measurement of the actual substrate 
concentration in Fed-Batch systems is difficult and is therefore estimated on the 
basis of the changes of measurable factors such as pH, dissolved oxygen and the 
partial pressure of waste gases such as C0 2 . Batch and Fed-Batch fermentations 

20 are common and well known in the art and examples may be found in Brock, 
supra. 

It is also contemplated that the method would be adaptable to continuous 
fermentation methods. Continuous fermentation is an open system where a 
defined fermentation media is added continuously to a bioreacior and an equal 

25 amount of conditioned media is removed simultaneously for processing. 

Continuous fermentation generally maintains the cultures at a constant high 
density where cells are primarily in log phase growth. 

Continuous fermentation allows for the modulation of one factor or any 
number of factors that affect cell growth or end product concentration. For 

30 example, one method will maintain a limiting nutrient such as the carbon source 
or nitrogen level at a fixed rate and allow all other parameters to moderate. In 
other systems a number of factors affecting growth can be altered continuously 
while the cell concentration, measured by media turbidity, is kept constant. 
Continuous systems strive to maintain steady state growth conditions and thus the 

35 cell loss due to media being drawn off must be balanced against the cell growth 
rate in the fermentation. Methods of modulating nutrients and growth factors for 
continuous fermentation processes as well as techniques for maximizing the rate 
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of product formation are well known in the art of industrial microbiology and a 
variety of methods are detailed by Brock, supra. 

The present invention may be practiced using either batch, fed-batch or 
continuous processes and that any known mode of fermentation would be suitable. 
5 Additionally, it is contemplated that cells may be immobilized on a substrate as 
whole cell catalysts and subjected to fermentation conditions for 1,3-propanediol 
production. 

Alterations in the 1,3-propanediol production pathway : 

Representative enzyme pathway . The production of 1 .3 -propanediol from 

10 glucose can be accomplished by the following series of steps. This series is 

representative of a number of pathways known to those skilled in the art. Glucose 
is converted in a series of steps by enzymes of the glycolytic pathway to 
dihydroxyacetone phosphate (DHAP) and 3-phosphoglyceraldehyde (3-PG). 
Glycerol is then formed by either hydrolysis of DHAP to dihydroxyacetone 

15 (DHA) followed by reduction, or reduction of DHAP to glycerol 3-phosphate 
(G3P) followed by hydrolysis. The hydrolysis step can be catalyzed by any 
number of cellular phosphatases which are known to be specific or non-specific 
with respect to their substrates or the activity can be introduced into the host by 
recombination. The reduction step can be catalyzed by a N AD^ (or NADP*) 

20 linked host enzyme or the activity can be introduced into the host by 
recombination. It is notable that the dha regulon contains a glycerol 
dehydrogenase (E.C. 1 . 1 . 1 .6) which catalyzes the reversible reaction of 
Equation 3. 

25 Glycerol -> 3-HP + H 2 0 (Equation 1 ) 

3 -HP + NADH + H~ -> 1,3-Propanediol + NAD + (Equation 2) 

Glycerol + NAD* -> DHA + NADU + IV (Equation 3) 

Glycerol is converted to 1,3-propanediol via the intermediate 3-hydroxy- 
30 propionaldehye (3-HP) as has been described in detail above. The intermediate 
3-HP is produced from glycerol (Equation 1) by a dehydratase enzyme which can 
be encoded by the host or can introduced into the host by recombination. This 
dehydratase can be glycerol dehydratase (E.C. 4.2.1.30), diol dehydratase 
(E.C. 4.2.1.28), or any other enzyme able to catalyze this transformation. 
35 Glycerol dehydratase, but not diol dehydratase, is encoded by the dha regulon. 
1,3-Propanediol is produced from 3-HP (Equation 2) by a NAD*- (or NADP*) 
linked host enzyme or the activity can introduced into the host by recombination. 
This final reaction in the production of 1.3-propanediol can be catalyzed by 
1 .3-propanedioI dehydrogenase (E.C. 1 . 1 . 1 .202) or other alcohol dehydrogenases. 

->-> 
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Mutations and transformations that atlect carbon channeling . A variety of mutant 
organisms comprising variations in the 1,3-propanediol production pathway will 
be useful in the present invention. The introduction of a triosephosphate 
isomerase mutation (//?/-) into the microorganism is an example of the use of a 
5 mutation to improve the performance by carbon channeling. Alternatively, 
mutations which diminish the production of ethanol (adh) or lactate (Idh) will 
increase the availability of NADH for the production of 1,3-propanediol. 
Additional mutations in steps of glycolysis after glyceraldehyde-3 -phosphate such 
as phosphoglycerate mutase (pgrn) would be useful to increase the flow of carbon 

10 to the 1,3-propanediol production pathway. Mutations that effect glucose 

transport such as PTS which would prevent loss of PEP may also prove useful. 
Mutations which block alternate pathways for intermediates of the 
1,3-propanediol production pathway such as the glycerol catabolic pathway (glp) 
would also be useful to the present invention. The mutation can be directed 

15 toward a structural gene so as to impair or improve the activity of an enzymatic 
activity or can be directed toward a regulatory gene so as to modulate the 
expression level of an enzymatic activity. 

Alternatively, transformations and mutations can be combined so as to 
control particular enzyme activities for the enhancement of 1,3-propanediol 

20 production. Thus it is within the scope of the present invention to anticipate 
modifications of a whole cell catalyst which lead to an increased production of 
1,3-propanediol. 

Identification and purification of 13 -propanediol : 

Methods for the purification of 1,3-propanediol from fermentation media 

25 are known in the art. For example, propanediols can be obtained from cell media 
by subjecting the reaction mixture to extraction with an organic solvent, 
distillation and column chromatography (U.S. 5,356,812). A particularly good 
organic solvent for this process is cyclohexane (U.S. 5,008,473). 

1,3 -Propanediol may be identified directly by submitting the media to high 

30 pressure liquid chromatography (HPLC) analysis. Preferred in the present 

invention is a method where fermentation media is analyzed on an analytical ion 
exchange column using a mobile phase of 0.01 N sulfuric acid in an isocratic 
fashion. 

Identification and purification of G3PDH and G3P phosphatase : 
35 The levels of expression of the proteins G3PDH and G3P phosphatase are 

measured by enzyme assays, G3PDH activity assay relied on the spectral 
properties of the cosubstrate. NADH. in the DHAP conversion to G-3-P. NADH 
has intrinsic UV/vis absorption and its consumption can be monitored 
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spectrophotometrically at 340 nm. G3P phosphatase activity can be measured by 
any method of measuring the inorganic phosphate liberated in the reaction. The 
most commonly used detection method used the visible spectroscopic 
determination of a blue-colored phosphomolybdate ammonium complex. 

5 EXAMPLES 
GENERAL METHODS 

Procedures for phosphorylations, ligations and transformations are well 
known in the art. Techniques suitable for use in the following examples may be 
found in Sambrook, J. et al., Molecular Cloning: A Laboratory Manual . Second 

10 Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) 
Materials and methods suitable for the maintenance and growth of 
bacterial cultures are well known in the art. Techniques suitable for use in the 
following examples may be found as set out in Manual of Methods for General 
Bacteriology (Phillipp Gerhardt, R. G. E. Murray, Ralph N. Costilow, Eugene W. 

15 Nester, Willis A. Wood, Noel R. Krieg and G. Briggs Phillips, eds), American 
Society for Microbiology, Washington, DC. (1994)) or by Thomas D. Brock in 
Biotechnology: A Textbook of Industrial Microbiology , Second Edition, Sinauer 
Associates, Inc., Sunderland, MA (1989). All reagents and materials used for the 
growth and maintenance of bacterial cells were obtained from Aldrich Chemicals 

20 (Milwaukee, WI), DIFCO Laboratories (Detroit. MI), GIBCO/BRL (Gaithersburg, 
MD), or Sigma Chemical Company (St. Louis, MO) unless otherwise specified. 

The meaning of abbreviations is as follows: "h" means hour( s), "min" 
means minute(s), "sec" means second(s), u d" means day(s), l; mL" means 
milliliters, "L" means liters. 

25 ENZYME ASSAYS 

Glycerol dehydratase activity in cell-free extracts was determined using 
1,2-propanediol as substrate. The assay, based on the reaction of aldehydes with 
methylbenzo-2-thiazolone hydrazone, has been described by Forage and Foster 
(Biochim. Biophys. Acta, 569, 249 (1979)). The activity of 1,3 -propanediol 

30 oxidoreductase, sometimes referred to as 1,3 -propanediol dehydrogenase, was 
determined in solution or in slab gels using 1 ,3-propanediol and NAD* as 
substrates as has also been described. Johnson and Lin, J. Bacteriol '., 169, 2050 
(1987). NADH or NADPH dependent glycerol 3-phosphate dehydrogenase 
(G3PDH) activity was determined spectrophotometrically, following the 

35 disappearance of NADU or NADPH as has been described. (R. M. Bell and J. E. 
Cronan, Jr., J. Biol Chem. 250:7153-8 (1975)). 
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Assay for glvcerol-3 -phosphatase, GPP 

The assay for enzyme activity was performed by incubating the extract 
with an organic phosphate substrate in a bis-Tris or MES and magnesium buffer, 
pH 6.5. The substrate used was I-a-glycerol phosphate; dj-ct-glycerol phosphate. 
5 The final concentrations of the reagents in the assay are: buffer (20 mM, bis-Tris 
or 50 mM MES); MgCl 2 (10 mM); and substrate (20 mM). If the total protein in 
the sample was low and no visible precipitation occurs with an acid quench, the 
sample was conveniently assayed in the cuvette. This method involved incubating 
an enzyme sample in a cuvette that contained 20 mM substrate (50 uL, 200 mM), 

10 50 mM MES, 10 mM MgCU, pH 6.5 buffer. The final phosphatase assay volume 
was 0.5 mL. The enzyme-containing sample was added to the reaction mixture; 
the contents of the cuvette were mixed and then the cuvette was placed in a 
circulating water bath at T = 37 °C for 5 to 120 min depending on whether the 
phosphatase activity in the enzyme sample ranged from 2 to 0.02 U/mL. The 

15 enzymatic reaction was quenched by the addition of the acid molybdate reagent 
(0.4 mL). After the Fiske SubbaRow reagent (0.1 mL) and distilled water 
(1 .5 mL) were added, the solution was mixed and allowed to develop. After 
10 min, the absorbance of the samples was read at 660 nm using a Cary 219 
UV/Vis spectophotometer. The amount of inorganic phosphate released was 

20 compared to a standard curve that was prepared by using a stock inorganic 
phosphate solution (0.65 mM) and preparing 6 standards with final inorganic 
phosphate concentrations ranging from 0.026 to 0.130 umol/mL. 
Isolation and Identification 1 ,3-pronanediol 

The conversion of glycerol to 1,3-propanediol was monitored by IIPLC. 

25 Analyses were performed using standard techniques and materials available to one 
skilled in the art of chromatography. One suitable method utilized a Waters 
Maxima 820 IIPLC system using UV (210 nm) and RI detection. Samples were 
injected onto a Shodex SH- 1011 column (8 mm x 300 mm, purchased from 
Waters, Milford, MA) equipped with a Shodex SH- 101 1 P precolumn (6 mm x 

30 50 mm), temperature controlled at 50 °C, using 0.01 N H2SO4 as mobile phase at 
a How rate of 0.5 mL/min. When quantitative analysis was desired, samples were 
prepared with a known amount of trimethylacetic acid as external standard. 
Typically, the retention times of glycerol (RI detection), 1,3-propanediol (RI 
detection), and trimethylacetic acid (UV and RI detection) were 20.67 min. 

35 26.08 min, and 35.03 min, respectively. 

Production of 1.3-propanediol was confirmed by GC/MS. Analyses were 
performed using standard techniques and materials available to one of skill in the 
art of GC/MS. One suitable method utilized a Hewlett Packard 5890 Series II gas 
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chromatograph coupled to a Hewlett Packard 5971 Series mass selective detector 
(EI) and a HP-INNOWax column (30 m length, 0.25 mm i.d., 0.25 micron film 
thickness). The retention time and mass spectrum of 1,3-propanediol generated 
were compared to that of authentic 1,3 -propanediol {nv'e: 57. 58). 
5 An alternative method for GC/MS involved derivatization of the sample. 

To 1.0 mL of sample (e.g.. culture supernatant) was added 30 uL of concentrated 
(70% v/v) perchloric acid. After mixing, the sample was frozen and lyophilized. 
A 1:1 mixture of bis(trimethylsilyl)trifluoroacetamide:pyridine (300 uL) was 
added to the lyophilized material, mixed vigorously and placed at 65 °C for one h. 

10 The sample was clarified of insoluble material by centrifugation. The resulting 
liquid partitioned into two phases, the upper of which was used for analysis. The 
sample was chromatographed on a DB-5 column (48 m, 0.25 mm I.D., 0.25 urn 
film thickness; from J&W Scientific) and the retention time and mass spectrum of 
the 1,3-propanediol derivative obtained from culture supernatants were compared 

15 to that obtained from authentic standards. The mass spectrum of TMS-derivatized 
1.3-propanediol contains the characteristic ions of 205, 1 77, 1 30 and 115 AMU. 

KXAMPLE 1 

CLONING AND TRANSFORMATION OF E. CPU HOST CELLS WITH 
COSMID DNA FOR THE EXPRESSION OF 1.3-PROPANEDIOL 

20 Media 

Synthetic S12 medium was used in the screening of bacterial transformants 
for the ability to make 1,3-propanediol. SI 2 medium contains: 10 mM 
ammonium sulfate, 50 mM potassium phosphate buffer, pH 7.0, 2 mM MgCl 2 , 
0.7 mM CaCl 2 , 50 uM MnCl 2s 1 uM FeCl 3 , 1 uM ZnCL 1.7 uM CuS0 4 , 2.5 uM 

25 CoCl 2 , 2.4 uM Na 2 Mo0 4 , and 2 uM thiamine hydrochloride. 

Medium A used for growth and fermentation consisted of: 10 mM 
ammonium sulfate; 50 mM MOPS/KOH buffer, pH 7.5; 5 mM potassium 
phosphate buffer, pll 7.5; 2 mM MgCl 2 ; 0.7 mM CaCl 2 ; 50 uM MnCl 2 , 1 uM 
FcCl 3 ; 1 uM ZnCl; 1.72 uM CuS0 4 , 2.53 uM CoCl 2 ; 2.42 uM Na 2 Mo0 4 ; 2 uM 

30 thiamine hydrochloride; 0.01% yeast extract: 0.01% casamino acids; 0.8 ug/mL 
vitamin B i2 ; and 50 ug/mL amp. Medium A was supplemented with cither 0.2% 
glycerol or 0.2% glycerol plus 0.2% D-glucose as required. 
Cells : 

Klebsiella pneumoniae ECL2106 (Ruch ct al., J. Bacterial., 124, 348 
35 (1975)), also known in the literature as K. aerogenes or Aerobacter aerogenes. 
was obtained from E. C. C. Lin (Harvard Medical School. Cambridge, MA) and 
was maintained as a laboratory culture. 
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Klebsiella pneumoniae ATCC 25955 was purchased from American Type 
Culture Collection (Rockville, MD). 

£ coli DH5a was purchased from Gibco/BRL and was transformed with 
the cosmid DNA isolated from Klebsiella pneumoniae ATCC 25955 containing a 
5 gene coding for either a glycerol or diol dehydratase enzyme. Cosmids containing 
the glycerol dehydratase were identified as pKPl and pKP2 and cosmid 
containing the diol dehydratase enzyme were identified as pKP4. Transformed 
DH5ct cells were identified as DH5a-pKPl, DH5a-pKP2, and DH5a-pKP4. 

£ coli ECL707 (Sprenger et al., J. Gen. Microbiol., 135, 1255 (1989)) was 
10 obtained from E. C. C. Lin (Harvard Medical School, Cambridge, MA) and was 
similarly transformed with cosmid DNA from Klebsiella pneumoniae. These 
transformants were identified as ECL707-pKPl and ECL707-pKP2, containing 
the glycerol dehydratase gene and ECL707-pKP4 containing the diol dehydratase 
gene. 

15 E. coli AA200 containing a mutation in the tpi gene (Anderson et al., 

J. Gen Microbiol, 62, 329 (1970)) was purchased from the E. coli Genetic Stock 
Center, Yale University (New Haven, CT) and was transformed with Klebsiella 
cosmid DNA to give the recombinant organisms AA200-pKFl and AA200-pKP2. 
containing the glycerol dehydratase gene, and AA200-pfCP4, containing the diol 

20 dehydratase gene. 
DH5a : 

Six transformation plates containing approximately 1,000 colonies of 
E. coli XL 1 -Blue MR transfected with K. pneumoniae DNA were washed with 
5 mL LB medium and centrifuged. The bacteria were pelleted and resuspended in 

25 5 mL LB medium - glycerol. An aliquot (50 uL) was inoculated into a 15 mL 
tube containing S12 synthetic medium with 0.2% glycerol + 400 ng per mL of 
vitamin B, 2 + 0.001% yeast extract + 50amp. The tube was filled with the 
medium to the top and wrapped with parafilm and incubated at 30 °C. A slight 
turbidity was observed after 48 h. Aliquots, analyzed for product distribution as 

30 described above at 78 h and 132 h, were positive for 1 ,3-propanediol, the later 
time points containing increased amounts of 1 ,3-propanediol. 

The bacteria, testing positive for 1,3-propanediol production, were serially 
diluted and plated onto LB-50amp plates in order to isolate single colonics. 
Forty -eight single colonics were isolated and checked again for the production of 

35 1 ,3-propanediol. Cosmid DNA was isolated from 6 independent clones and 

transformed into £ coli strain DH5a. The transformants were again checked for 
the production of 1,3-propanediol. Two transformants were characterized further 
and designated as DH5a-pKPl and DH5ot-pKP2. 
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A 12.1 kb EcoRl-Sall fragment from pKP 1 , subcloned into pIBDl (IBI 
Biosystem, New Haven, CT), was sequenced and termed pHK28-26 (SEQ ID 
NO: 19). Sequencing revealed the loci of the relevant open reading frames of the 
dha operon encoding glycerol dehydratase and genes necessary for regulation. 
5 Referring to SEQ ID NO: 19, a fragment of the open reading frame for dhaK 

encoding dihydroxyacetone kinase is found at bases 1-399; the open reading frame 
dhaD encoding glycerol dehydrogenase is found at bases 983-2107; the open 
reading frame dhaR encoding the repressor is found at bases 2209-4134; the open 
reading frame dhaT encoding 1,3-propanediol oxidoreductase is found at bases 

10 5017-6180: the open reading frame dhaBl encoding the alpha subunit glycerol 
dehydratase is found at bases 7044-871 1; the open reading frame dhaB2 encoding 
the beta subunit glycerol dehydratase is found at bases 8724-9308; the open 
reading frame dhaB3 encoding the gamma subunit glycerol dehydratase is found 
at bases 931 1-9736; and the open reading frame dhaBX, encoding a protein of 

15 unknown function is found at bases 9749-1 1572. 

Single colonies of E. coli XL 1 -Blue MR transfected with packaged cosmid 
DNA from K. pneumoniae were inoculated into microtiter wells containing 
200 uL of S15 medium (ammonium sulfate, 10 mM; potassium phosphate buffer, 
P H 7.0, 1 mM; MOPS/KOH buffer, pH 7.0, 50 mM; MgCl 2 , 2 mM; CaCl 2 , 

20 0.7 mM: MnCl 2 , 50 uM; FeCI 3 , 1 uM; ZnCl, 1 uM: CuS0 4 , 1.72 uM; CoCI 2 , 
2.53 uM; Na 2 Mo0 4; 2.42 uM; and thiamine hydrochloride, 2 uM) + 0.2% 
glycerol + 400 ng/mL of vitamin B 12 + 0.001% yeast extract + 50 ug/mL 
ampicillin. In addition to the microtiter wells, a master plate containing 
LB-50 amp was also inoculated. After 96 h, 100 uL was withdrawn and 

25 centrifuged in a Rainin microfuge tube containing a 0.2 micron nylon membrane 
filter. Bacteria were retained and the filtrate was processed for HPLC analysis. 
Positive clones demonstrating 1,3-propanediol production were identified after 
screening approximately 240 colonies. Three positive clones were identified, two 
of which had grown on LB-50 amp and one of which had not. A single colony, 

30 isolated from one of the two positive clones grown on LB-50 amp and verified for 
the production of 1,3-propanediol, was designated as pKP4. Cosmid DNA was 
isolated from E. coli strains containing pKJM and E. coli strain DH5a was 
transformed. .An independent transformant, designated as DH5a-pKJP4, was 
verified for the production of 1,3-propanediol. 

35 ECL207: 

E. coli strain ECL707 was transformed with cosmid K. pneumoniae DNA 
corresponding to one of pKPl. pKP2, pKP4 or the Supercos vector alone and 
named ECL707-pKPl, ECL707-pKP2, ECL707-pKP4, and ECL707-sc, 
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respectively. ECL707 is defective in glpK, gld, and ptsD which encode the 
ATP -dependent glycerol kinase, NAD + -hnked glycerol dehydrogenase, and 
enzyme II for dihydroxyacetone of the phosphoenolpyruvate-dependent 



phosphotransferase system, respectively. 



Twenty single colonies of each cosmid transformation and five of the 



Supercos vector alone (negative control) transformation, isolated from LB-50 amp 
plates, were transferred to a master LB-50 amp plate. These isolates were also 
tested for their ability to convert glycerol to 1,3 -propanediol in order to determine 
if they contained dehydratase activity. The transformants were transferred with a 
10 sterile toothpick to microtiter plates containing 200 uL of Medium A 

supplemented with either 0.2% glycerol or 0.2% glycerol plus 0.2% D-glucose. 
After incubation for 48 hr at 30 °C, the contents of the microtiter plate wells were 
filtered through an 0.45 micron nylon filter and chromatographed by HPLC. The 
results of these tests are given in Table 1 . 



* (Number of positive isolates/number of isolates tested) 
AA200 : 

£ coli strain AA200 was transformed with cosmid K. pneumoniae DNA 
corresponding to one of pKPl, pKP2, pKP4 and the Supercos vector alone and 
named AA200-pKPl, AA200-pKP2. AA200-pKP4, and AA200-sc. respectively. 
20 Strain AA200 is defective in triosephosphate isomerase (tpi~). 

Twenty single colonies of each cosmid transformation and five of the 
empty vector transformation were isolated and tested for their ability to convert 
glycerol to 1 .3-propanediol as described for E. coli strain ECL707. The results of 
these tests are given in Table 2. 



15 



Table 1 



Conversion of glycerol to 1 3-propanediol by transformed ECL707 



Transformant Glycerol * Glycerol plus Glucose * 



ECL707-pKPl 19/20 19/20 

ECL707-pKP2 18/20 20/20 

ECL707-pKP4 0/20 20/20 

ECL707-sc 0/5 0/5 
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Table 2 



Conversion of glycerol to 1,3 -propanediol by transformed AA200 



Transformant Glycerol * Glycerol plus Glucose * 




*(Number of positive isolates/number of isolates tested) 

EXAMPLE 2 

CONVERSION OF D-GLUCOSE TO L3-PRQPANED10L BY 
RECOMBINANT E coli USING PARI. GPP2, dhaB. and dhaT 
5 Construction of general purpose expression plasmids for use in transformation of 
Escherichia coli 
The expression vector pTacIQ 

The E. coli expression vector, pTacIQ, contains the laclq gene (Farabaugh, 
Nature 274, 5673 (1978)) and tac promoter (Amann et aL Gene 25, 167 (1983)) 
10 inserted into the EcoRI of pBR322 (Sutcliffe et al., Cold Spring Hurt. Symp. 
Quant. Biol. 43, 77 (1979)). A multiple cloning site and terminator sequence 
(SEQ ID NO:20) replaces the pBR322 sequence from EcoRI to Sphl. 
Subdoning the glycerol dehydratase genes (dhaBL 2, 3) 



15 5' end and a Xbal site at the 3' end) was amplified from pHK28-26 by PCR using 
primers (SEQ ID NOS:21 and 22). The product was subcloned into pLitmus29 
(New England Biolab, Inc., Beverly, MA) to generate the plasmid pDHAB3 
containing dhaB3, 



20 dhaB operon from pHK28-26 was cloned into pBluescriptll KS+ (Stratagene. La 
Jolla, CA) using the restriction enzymes KpnJ and EcoRI to create the plasmid 
pM7. 

The dhaBXgcne was removed by digesting the plasmid pM7, which 
contains dhaB( 1,2,3.4), with Apal and Xbal (deleting part of dhaB3 and all of 
25 dhaBX). The resulting 5.9 kb fragment was purified and ligated with the 325-bp 
Apal-Xbal fragment from plasmid pDHAB3 (restoring the dhaBS gene) to create 
pMl 1, which contains dhaBi 1.2.3). 

The open reading frame for the dhaB! gene (incorporating a Hindlll site 
and a consensus RBS ribosome binding site at the 5' end and a Xbal site at the 3' 
30 end) was amplified from pHK28-26 by PCR using primers (SEQ ID NO:23 and 



The open reading frame for dhaB3 gene (incorporating an EcoRI site at the 



The region containing the entire coding region for the four genes of the 
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SEQ ID NO:24). The product was subcloned into pLitmus28 (New England 
Biolab, Inc.) to generate the plasmid pDTl containing dhaBl. 

A Notl-Xbal fragment from pMl 1 containing part of the dhaBl gene, the 
dhaB2 gene and the dhuB3 gene was inserted into pDTl to create the dhaB 
5 expression plasmid, pDT2. The Hindlll-Xbal fragment containing the 
dhaB(l,2,3) genes from pDT2 was inserted into pTacIQ to create pDT3. 
Subcloning the h3-propanedio) dehydrogenase gene {dhaT) 

The KpnI-SacI fragement of pHK28-26, containing the complete 
1,3-propanediol dehydrogenase (dhaT) gene, was subcloned into pBiuescnptll 

10 KS+ creating plasmid pAH 1 . The dhaT gene (incorporating an Xbal site at the 5' 
end and a BamHI site at the 3' end) was amplified by PCR from pAHl as template 
DNA using synthetic primers (SEQ ID NO:25 with SEQ ID NO:26). The product 
was subcloned into pCR-Script (Stratagene) at the Srfl site to generate the 
plasmids pAH4 and pAH5 containing dhaT. The plasmid pAH4 contains the 

1 5 dhaT gene in the correct orientation for expression from the lac promoter in 

pCR-Script and pAH5 contains the dhaT gene in the opposite orientation. The 
Xbal-BamHI fragment from pAH4 containing the dhaT gene was inserted into 
pTacIQ to generate plasmid pAH8. The Hindlll-BamHl fragment from pAH8 
containing the RBS and dhaT gene was inserted into pBluescriptll KS+ to create 

20 pAH 11. The HindUI-Sall fragment from pAH8 containing the RBS, dhaT gene 
and terminator was inserted into pBluescriptll SK+ to create pAH12. 
Construction of an expression cassette for dhaB(L2J) and dhaT 

An expression cassette for the dhaB( 1,2,3) and dhaT was assembled from 
the individual dhaB( 1,2,3) and dha T subclones described above using standard 

25 molecular biology methods. The Spei-Kpnl fragment from pAH8 containing the 
RBS, dhaT gene and terminator was inserted into the Xbal-Kpnl sites of pDT3 to 
create pAH23. The Smal-EcoRI fragment between the dhaB3 and dhaT gene of 
pAH23 was removed to create pAH26. The Spel-NotI fragment containing an 
EcoRl site from pDT2 was used to replace the Spel-NotI fragment of pAH26 to 

30 generate pAH27. 

Construction of expression cassette for dhaT and dhaB(l t 2,3) 

An expression cassette for dhaT and dhaB( 1,2,3) was assembled from the 
individual dhaBf 1,2.3) arid dhaT subclones described previously using standard 
molecular biology methods. A SpcI-SacI fragment containing the dhaBfl.2,3) 

35 genes from pDT3 was inserted into pAHl 1 at the Spel-SacI sites to create pAH24. 
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Cloning and expression of glycerol 3 -phosphatase for increased glycerol 
production in E. coli 

The Saccharomyces cerevisiae chromosomeV lamda clone 6592 (Gene 
Bank, acession # Ul 8813x1 1) was obtained from ATCC. The glycerol 
5 3-phosphate phosphatase (GPP2) gene (incorporating an BamHl-RBS-Xbal site at 
the 5" end and a Smal site at the 3' end) was cloned by PCR cloning from the 
lamda clone as target DNA using synthetic primers (SEQ ID NO:27 with SEQ ID 
NO:28). The product was subcloned into pCR-Script (Stratagene) at the Srfl site 
to generate the plasmids pAH15 containing GPP2. The plasmid pAH15 contains 

10 the GPP2 gene in the inactive orientation for expression from the lac promoter in 
pCR-Script SK+. The BamHI-Smal fragment from pAHl 5 containing the GPP2 
gene was inserted into pBlueScriptll SK+ to generate plasmid pAH19. The 
pAH19 contains the GPP2 gene in the correct orientation for expression from the 
lac promoter. The Xbal-PstI fragment from pAI II 9 containing the GPP2 gene 

1 5 was inserted into pPI IOX2 to create plasmid pAI 12 1 . 

Plasmids for the expression of dhgL dhaB(l,2 t 3) and GPP2 tienes 

A Sall-EcoRI-Xbal linker (SEQ ID NOS:29 and 30) was inserted into 
pAH5 which was digested with the restriction enzymes, Sall-Xbal to create 
pDT16. The linker destroys the Xbal site. The 1 kb Sall-Mlul fragment from 

20 pDT16 was then inserted into pAH24 replacing the existing Sall-Mlul fragment to 
create pDT18. 

The 4.1 kb EcoRl-Xbal fragment containing the expression cassette for 
dhaT and dhaB{ 1 ,2,3) from pDTl 8 and the 1 .0 kb Xbal-Sall fragement containing 
the GPP2 gene from pAH21 was inserted into the vector pMMB66EH (Fiiste et 
25 al., GENE, 48, 1 19 (1986)) digested with the restriction enzymes EcoRI and Sail 
to create pDT20. 

Plasmids for the over-expression of PARI in E. coli 

DAR1 was isolated by PCR cloning from genomic S. cerevisiae DNA 
using synthetic primers (SEQ ID NO:46 with SEQ ID NO:47). Successful PCR 

30 cloning places an Ncol site at the 5' end of DAR1 where the ATG within Ncol is 
the DAR1 initiator methionine. At the 3' end of DAR1 a BamHl site is introduced 
following the translation terminator. The PCR fragments were digested with Ncol 
+ BamHl and cloned into the same sites within the expression plasmid pTrc99A 
(Pharmacia, Piscataway, New Jersey) to give pDARl A. 

35 In order to create a better ribosome binding site at the 5' end of DAR1. a 

Spel-RBS-Ncol linker obtained by annealing synthetic primers (SEQ ID NO:48 
with SEQ ID NO:49) was inserted into the Ncol site of pDARl A to create 
pAH40. Plasmid pAH40 contains the new RBS and DAR1 gene in the correct 
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orientation for expression from the trc promoter of Trc99A (Pharmacia). The 
NcoI-BamHl fragment from pDARl A and a second set of Spel-RBS-Ncol linker 
obtained by annealing synthetic primers (SEQ ID NO:3 1 with SEQ ID NO:32) 
was inserted into the Spel-BamHI site of pBluescnpt II-SK+ (Stratagene) to create 

5 pAH41 . The construct pAH41 contains an ampicillin resistance gene. The 

NcoI-BamHI fragment from pDARl A and a second set of Spel-RBS-Ncol linker 
obtained by annealing synthetic primers (SEQ ID NO:31 with SEQ ID NO:32) 
was inserted into the Spel-BamHI site of pBC-SK+ (Stratagene) to create pAII42. 
The construct pAH42 contains a chloroamphenicol resistance gene. 

10 Construction of an expression cassette for PARI and GPP2 

An expression cassette for DAR1 and GPP2 was assembled from the 
individual DAR1 and GPP2 subclones described above using standard molecular 
biology methods. The BamHI-PstI fragment from pAH19 containing the RBS 
and GPP2 gene was inserted into pAH40 to create pAH43. The BamHI-PstI 

15 fragment from pAH19 containing the RBS and GPP2 gene was inserted into 
pAH41 to create pAH44. The same BamHI-PstI fragment from pAH19 
containing the RBS and GPP2 gene was also inserted into pAH42 to create 
pAH45. 

E. coli strain construction 
20 E coli W1485 is a wild-type K-12 strain (ATCC 12435). This strain was 

transformed with the plasmids pDT20 and pAH42 and selected on LA (Luria 
Agar, Difco) plates supplemented with 50 ng/mL carbencillim and 10 ng/mL 
chloramphenicol. 

Production of 13-propancdiol from glucose 

25 E. coli \V1485/pDT20/pAH42 was transferred from a plate to 50 mL of a 

medium containing per liter: 22.5 g glucose, 6.85 g K 2 HP0 4 , 6.3 g (NH^SC^, 
0.5 g NaHC0 3 , 2.5 g NaCl, 8 g yeast extract, 8 g tryptone, 2.5 mg vitamin B l2 , 
2.5 mL modified Balch's trace-element solution. 50 mg carbencillim and 10 mg 
chloramphenicol, final pH 6.8 (HC1), then filter sterilized. The composition of 

30 modified Balch's trace-element solution can be found in Methods for General and 
Molecular Bacteriology (P. Gerhardt ct al., eds, p. 1 58, American Society for 
Microbiology, Washington, DC (1994)). After incubating at 37 °C. 300 rpm for 
6 h, 0.5 g glucose and IPTG (final concentration = 0.2 raM) were added and 
shaking was reduced to 100 rpm. Samples were analyzed by GC/MS. After 24 h. 

35 \V1485 / pDT20/pAH42 produced 1.1 g/L glycerol and 195 mg'L 1 .3 -propanediol. 
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EXAMPLE 3 

CLONING AND EXPRESSION OF dhaB AND dhaT 
IN Saccharomyces cerevisiae 
Expression plasmids that could exist as replicating episomal elements were 
5 constructed for each of the four dha genes. For all expression plasmids a yeast 
ADH1 promoter was present and separated from a yeast ADH1 transcription 
terminator by fragments of DNA containing recognition sites for one or more 
restriction endonucleases. Each expression plasmid also contained the gene for 
[5-lactamase for selection in E. coli on media containing ampicillin, an origin of 
10 replication for plasmid maintainence in E. coli, and a 2 micron origin of 

replication for maintainence in S. cerevisiae. The selectable nutritional markers 
used for yeast and present on the expression plasmids w r ere one of the following: 
HIS3 gene encoding imidazoleglycerolphosphate dehydratase, URA3 gene 
encoding orotidine 5'-phosphate decarboxylase, TRP1 gene encoding N-(5'- 
15 phosphoribosyl)-anthranilate isomerase, and LEU2 encoding p-isopropylmalate 
dehydrogenase. 

The open reading frames for dhaT, dhaB3, dhaB2 and dhaBl were 
amplified from pHK28-26 (SEQ ID NO:19) by PCR using primers (SEQ ID 
NO:38 with SEQ ID NO:39, SEQ ID NO:40 with SEQ ID NO:41, SEQ ID NO:42 

20 with SEQ ID NO:43, and SEQ ID NO:44 with SEQ ID NO:45 for dhaT, dhaB3. 
dhaB2 and dhaBl, respectively) incorporating EcoRl sites at the 5' ends (10 mM 
Tris pH 8.3, 50 mM KC1, 1.5 mM MgCl 2 , 0.0001% gelatin, 200 \iM dATP, 
200 dCTP, 200 \iM dGTP, 200 uM dTTP, 1 fiM each primer, 1-10 ng target 
DNA. 25 units/mL Amplitaq™ DNA polymerase (Perkin-Elmer Cetus, Norwalk 

25 CT)). PCR parameters were 1 min at 94 °C, 1 min at 55 °C, 1 min at 72 °C. 

35 cycles. The products were subcloncd into the EcoRl site of pHIL-D4 (Phillips 
Petroleum, Bartlesville, OK) to generate the plasmids pMP13, pMP14, pMP20 
and pMP 15 containing dhaT, dhaB3> dhaB2 and dhaBl, respectively. 
Construction of dhaBl expression plasmid pMCKlO 

30 The 7.8 kb replicating plasmid pGADGH (Clontech, Palo Alto. CA) was 

digested with Hindlll, dephosphorylated, and ligated to the dhaBl Hindlll 
fragment from pMP15. The resulting plasmid (pMCKlO) had dhaBl correctly 
oriented for transcription from the ADH1 promoter and contained a LEU2 marker. 
Construction of dhaBl expression plasmid pMCKl 7 

35 Plasmid pGADGH (Clontech, Palo Alto. CA) was digested with Hindlll 

and the single-strand ends converted to EcoRl ends by ligation with Ilindlll-XmnI 
and EcoRl-XmnI adaptors (New England Biolabs, Beverly. MAV Selection for 
plasmids with correct EcoRl ends was achieved by ligation to a kanamycin 



34 




WO 98/2 1339 PCTYUS97/2029 2 

resistance gene on an EcoRI fragment from plasmid pUC4K (Pharmacia Biotech, 
Uppsala), transformation into E. coli strain DH5a and selection on LB plates 
containing 25 jig/mL kanamycin. The resulting plasmid (pGAD/TCAN2) was 
digested with SnaBI and EcoRI and a 1 .8 kb fragment with the ADH1 promoter 
5 was isolated. Plasmid pGBT9 (Clontech, Palo Alto, CA) was digested with SnaBI 
and EcoRJ, and the 1.5 kb ADH1/GAL4 fragment replaced by the 1 .8 kb ADH1 
promoter fragment isolated from pGAD/KAN2 by digestion with SnaBI and 
EcoRI. The resulting vector (pMCKl 1) is a replicating plasmid in yeast with an 
ADH1 promoter and terminator and a TRP1 marker. Plasmid pMCKl 1 was 

10 digested with EcoRI, dephosphorylated, and ligated to the dhaB2 EcoRI fragment 
from pMP20. The resulting plasmid (pMCK17) had dhaB2 correctly oriented for 
transcription from the ADH1 promoter and contained a TRP1 marker. 
Construction oidhaB3 expression plasmid pMCK30 

Plasmid pGBT9 (Clontech) was digested with Nael and PvuII and the 1 kb 

15 TRP1 gene removed from this vector. The TRPI gene was replaced by a URA3 
gene donated as a 1 .7 kb Aatll/Nael fragment from plasmid pRS406 (Stratagene) 
to give the intermediary vector pMCK32. The truncated ADH1 promoter present 
on pMCK32 was removed on a 1.5 kb SnaBI/HcoRI fragment, and replaced with a 
full-length ADH1 promoter on a 1.8 kb SnaBI/EcoRI fragment from plasmid 

20 pGAD/KAN2 to yield the vector pMCK26. The unique EcoRI site on pMCK26 
was used to insert an EcoRI fragment with dhaB3 from plasmid pMP14 to yield 
pMCK30. The pMCK30 replicating expression plasmid has dhaB3 orientated for 
expression from the ADHl promoter, and has a URA3 marker. 
Construction of dhaT expression plasmid pMCK35 

25 Plasmid pGBT9 (Clontech) was digested with Nael and PvuII and the 1 kb 

TRPI gene removed from this vector. The TRPI gene was replaced by a HIS3 
gene donated as an Xmnl/Nael fragment from plasmid pRS403 (Stratagene) to 
give the intermediary vector pMCK33. The truncated ADHl promoter present on 
pMCK33 was removed on a 1.5 kb SnaBI/EcoRI fragment, and replaced with a 

30 full-length ADHl promoter on a 1.8 kb SnaBI/EcoRI fragment from plasmid 
pGAD/KAN2 to yield the vector pMCK3 1 . The unique EcoRI site on pMCK3 1 
was used to insert an EcoRI fragment with dhaT from plasmid pMP13 to yield 
pMCK35. The pMCK35 replicating expression plasmid has dhaT orientated for 
expression from the ADHl promoter, and has a HIS 3 marker. 

35 Transformation of S cercvisiae with dha expression plasmids 

S. cerevisiae strain YPH500 {ura3-52 lys2-801 ade2-101 trpl-A63 
his3-A200 ku2-Al) (Sikorski R. S. and Hieter P., Genetics 122. 19-27, ( 1 989)) 
purchased from Stratagene (La Jolla, CA) was transformed with 1 -2 ug of plasmid 
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DNA using a Frozen-EZ Yeast Transformation Kit (Catalog #T2001) (Zymo 

Research, Orange, CA). Colonies were grown on Supplemented Minimal 

Medium (SMM - 0.67% yeast nitrogen base without amino acids, 2% glucose) for 

3-4 d at 29 °C with one or more of the following additions: adenine sulfate 
5 (20 mg/L), uracil (20 mg/L), L-tryptophan (20 mg/L), L-histidine (20 mg/L), 

L-leucine (30 mg/L), L-lysine (30 mg/L). Colonies were streaked on selective 

plates and used to inoculate liquid media. 

Screening of S. cerevisiae transformants for dha genes 

Chromosomal DNA from URA+, HIS+ TRP~, LEU+ transformants was 
10 analyzed by PCR using primers specific for each gene (SEQ ID NOS:38-45). The 

presence of all four open reading frames was confirmed. 

Expression of dhaB and dhaT activity in transformed S. cerevisiae 

The presence of active glycerol dehydratase (dhaB) and 1,3 -propanediol 

oxido-reductase (dhaT) was demonstrated using in vitro enzyme assays. 
15 Additionally, western blot analysis confirmed protein expression from all four 

open reading frames. 

Strain YPH500, transformed with the group of plasmids pMCKlO, 

pMCK17, pMCK30 and pMCK35, was grown on Supplemented Minimal 

Medium containing 0.67% yeast nitrogen base without amino acids 2% glucose 
20 20 mg/L adenine sulfate, and 30 mg/L L-lysine. Cells were homogenized and 

extracts assayed for dhaB activity. A specific activity of 0.12 units per ing protein 

was obtained for glycerol dehydratase, and 0.024 units per mg protein for 

1,3-propanediol oxido-reductase. 

EXAMPLE 4 

25 PRODUCTION OF 1 ,3-PROPANEDIOL FROM D-GLUCOSE 

USING RECOMBINANT Saccharomvces cerevisiae 
S. cerevisiae YPH500, harboring the groups of plasmids pMCKlO. 
pMCK17, pMCK30 and pMCK35, was grown in a BiostatB fermcnter (D Draun 
Biotech, Inc.) in 1.0 L of minimal medium initially containing 20 g/L glucose, 

30 6.7 g/L yeast nitrogen base without amino acids, 40 mg/L adenine sulfate and 

60 mg/L L-lysine HC1. During the course of the growth, an additional equivalent 
of yeast nitrogen base, adenine and lysine was added. The fermenter was 
controlled at pH 5.5 with addition of 10% phosphoric acid and 2 M NaOH, 30 C C. 
and 40% dissolved oxygen tension through agitation control. After 38 h, the cells 

35 lOD 600 = 5.8 AU) were harvested by centrifugation and resuspended in base 
medium (6.7 g/L yeast nitrogen base without amino acids, 20 mg/L adenine 
sulfate, 30 mg'L L-lysine HCl, and 50 mM potassium phosphate butler, pH 7.0). 
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Reaction mixtures containing cells (OD 600 = 20 AU } in a total volume of 
4 mL of base media supplemented with 0.5% glucose, 5 ug/mL coenzyme B 12 and 
0, 10, 20, or 40 mM chloroquine were prepared, in the absence of light and 
oxygen (nitrogen sparging), in 10 mL crimp sealed serum bottles and incubated at 
5 30 °C with shaking. After 30 h. aliquots were withdrawn and analyzed by HPLC. 
The results are shown in the Table 3. 

Table 3 

Production of 1,3-propanediol using recombinant S. cerevisiae 

chloroquine 1 ,3 -propanediol 



reaction (mM) (mM) 

1 0 0.2 

2 10 0.2 

3 20 0.3 

4 40 0.7 



EXAMPLE 5 

10 USE OF A £ cerevisiae DOUBLE TRANSFORMANT FOR PRODUCTION 
OF 1 .3-PROPANEDIOL FROM D-GLUCOSE WHERE dhaB AND dhaT ARE 
INTEGRATED INTO THE GENOME 
Example 5 phrophetically demonstrates the transformation oiS. cerevisiae 
with dhaBL dhaB2, dhaB3, and dhaT and the stable integration of the genes into 
15 the yeast genome for the production of 1 ,3-propancdiol from glucose. 
Construction of expression cassettes 

Four expression cassettes {dhaBL dhaB2, dhaB3 t and dhaT) are 
constructed for glucose-induced and high-level constitutive expression of these 
genes in yeast, Saccharomyces cerevisiae. These cassettes consist of: (i) the 
20 phosphoglycerate kinase (PGK) promoter from S. cerevisiae strain S288C; (ii) one 
of the genes dhaBL dhaB2, dhaB3. or dhaT; and (iii) the PGK terminator from 
S. cerevisiae strain S288C. The PCR-based technique of gene splicing by overlap 
extension (Horton et al., BioTechniques, 8:528-535, (1990)) is used to recombine 
DNA sequences to generate these cassettes with seamless joints for optimal 
25 expression of each gene. These cassettes are cloned individually into a suitable 
vector (pLITMUS 39) with restriction sites amenable to multi-cassette cloning in 
yeast expression plasmids. 
Construction of vcast integration vectors 

Vectors used to effect the integration of expression cassettes into the yeast 
30 genome are constructed. These vectors contain the following elements: ( i)a 
polyclonmg region into which expression cassettes are subcloned; a unique 
marker used to select for stable yeast transformants; ( iii) replication origin and 
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selectable marker allowing gene manipulation in E. coil prior to transforming 
yeast. One integration vector contains the URA 3 auxotrophic marker (YIp352b), 
and a second integration vector contains the LYS2 auxotrophic marker (pKP7). 
Construction of yeast expression plasmids 
5 Expression cassettes for dhaBl and dhaBl are subcloned into the 

polycloning region of the YIp352b (expression plasmid #1), and expression 
cassettes for dhaB3 and dhaT are subcloned into the polycloning region of pKP7 
(expression plasmid #2). 

Transformation of yeast with expression plasmids 

10 S. cerevisiae (ura3, lys2) is transformed with expression plasmid #1 using 

Frozen-EZ Yeast Transformation kit (Zymo Research, Orange, CA), and 
transformants selected on plates lacking uracil. Integration of expression cassettes 
for dhaBl and dhaBl is confirmed by PCR analysis of chromosomal DNA. 
Selected transformants are re-transformed with expression plasmid #2 using 

15 Frozen-EZ Yeast Transformation kit, and double transformants selected on plates 
lacking lysine. Integration of expression cassettes for dhaBl and dhaT is 
confirmed by PCR analysis of chromosomal DNA. The presence of all four 
expression cassettes {dhaBl. dhaB2, dhaB3, dhaT) in double transformants is 
confirmed by PCR analysis of chromosomal DNA. 

20 Protein production from double-transformed yeast 

Production of proteins encoded by dhaBl, dhaBl, dhaB3 and dhaT from 
double-transformed yeast is confirmed by Western blot analysis. 
Enzvme activity from double-transformed yeast 

Active glycerol dehydratase and active 1,3-propanediol dehydrogenase 

25 from double-transformed yeast is confirmed by enzyme assay as described in 
General Methods above. 

Production of 1,3 -propanediol from double-transformed yeast 

Production of 1,3-propanediol from glucose in double-transformed yeast is 
demonstrated essentially as described in Example 4. 
30 EXAMPLE 6 

CONSTRUCTION OF PLASMIDS CONTAINING DAR1/GPP2 
OR dhaT/dhaBl-3 AND TRANSFORMATION INTO KLEBSIELLA SPECIES 
A r . pneumoniae (ATCC 25955), K. pneumoniae (ECL2106), and 
K. oxytoca (ATCC 8724) are naturally resistant to ampicillin (up to 1 50 ug/mL) 
35 and kanamycin (up to 50 ug/mL), but sensitive to tetracycline ( 1 0 ug/mL) and 
chloramphenicol (25 ug/mL). Consequently, replicating plasmids which encode 
resistance to these latter two antibiotics are potentially useful as cloning vectors 
for these Klebsiella strains. The wild-type K pneumoniae (ATCC 25955), the 
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glucose-derepressed A', pneumonia (ECL2106), and K. oxytoca (ATCC 8724) 
were successfully transformed to tetracycline resistance by electroporation with 
the moderate-copy-number plasmid, pDR322 ("New England Biolabs, Beverly. 
MA). This was accomplished by the following procedure: Ten mL of an 
5 overnight culture was inoculated into 1 L LB (1% (w/v) Bacto-tryptone (Difco. 
Detroit, MI), 0.5% (w/v) Bacto-yeast extract (Difco) and 0.5% (w/v) NaCl 
(Sigma, St. Louis, MO) and the culture was incubated at 37 °C to an OD 600 of 
0.5-0.7. The cells were chilled on ice, harvested by centrifugation at 4000 x g for 
15 min, and resuspended in 1 L ice-cold sterile 10% glycerol. The cells w r ere 

10 repeatedly harvested by centrifugation and progressively resuspended in 500 mL, 
20 mL and, finally, 2 mL ice-cold sterile 10% glycerol. For electroporation, 
40 uL of cells were mixed with 1-2 uL DNA in a chilled 0.2 cm cuvette and were 
pulsed at 200 2.5 kV for 4-5 msec using a BioRad Gene Pulser (BioRad, 
Richmond, CA). One uL of SOC medium (2% (w/v) Bacto-tryptone (Difco), 

15 0.5% (w/v) Bacto-yeast extract (Difco), 10 uM NaCl, 10 jiM MgCl 2 , 10 jiM 
MgS0 4 , 2.5 fiM KC1 and 20 [M glucose) was added to the cells and, after the 
suspension was transferred to a 17 x 100 mm sterile polypropylene tube, the 
culture was incubated for 1 lor at 37 °C, 225 rpm. Aliquots were plated on 
selective medium, as indicated. Analyses of the plasmid DNA from independent 

20 tetracycline-resistant trans formants showed the restriction endonuclease digestion 
panerns typical of pBR322, indicating that the vector was stably maintained after 
overnight culture at 37 °C in LB containing tetracycline (10 ug/mL). Thus, this 
vector, and derivatives such as pBR329 (ATCC 37264) which encodes resistance 
to ampicillin, tetracycline and chloramphenicol, may be used to introduce the 

25 DAR1/GPP2 and dhaT/dhaBl-3 expression cassettes into K. pneumoniae and 
K oxytoca. 

The DARl and GPP 2 genes may be obtained by PCR-mediated 
amplification from the Saccharomyces cerevisiae genome, based on their known 
DNA sequence. The genes are then transformed into A', pneumoniae or K. oxyioca 

30 under the control of one or more promoters that may be used to direct their 
expression in media containing glucose. For convenience, the genes were 
obtained on a 2.4 kb DNA fragment obtained by digestion of plasmid pAH44 with 
the Pvull restriction endonuclease, whereby the genes are already arranged in an 
expression cassette under the control of the E. coll lac promoter. This DNA 

35 fragment was ligated to /Vi/ //-digested pBR329, producing the insertional 

inactivation of its chloramphenicol resistance gene. The ligated DNA was used to 
transform E. coli DH5a (Gibco, Gaithersberg, MD). Trans formants were selected 
by their resistance to tetracycline (10 ug/mL.) and were screened for their 
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sensitivity to chloramphenicol (25 ug/mL). Analysis of the plasmid DNA from 
tetracycline-resistant, chloramphenicol-sensitive transformants confirmed the 
presence of the expected plasmids, in which the ?^ c -darl-gpp2 expression 
cassette was subcloned in either orientation into the pDR329 PvuII site. These 
5 plasmids, designated pJSPlA (clockwise orientation) and pJSPlB (counter- 
clockwise orientation), were separately transformed by electroporation into 
K. pneumonia (ATCC 25955), K pneumonia (ECL2106) and K. oxytoca 
( ATCC 8724) as described. Transformants were selected by their resistance to 
tetracycline (10 ug/mL) and were screened for their sensitivity to chloramphenicol 

10 ( 25 ug/mL). Restriction analysis of the plasmids isolated from independent 
transformants showed only the expected digestion patterns, and confirmed that 
they were stably maintained at 37 °C with antibiotic selection. The expression of 
the DAR1 and GPP2 genes may be enhanced by the addition of IPTG 
(0.2-2.0 mM) to the growth medium. 

15 The four K. pneumoniae dhaB(JS) and dhaT genes may be obtained by 

PCR-mediated amplification from the K. pneumoniae genome, based on their 
known DNA sequence. These genes are then transformed into A", pneumoniae 
under the control of one or more promoters that may be used to direct their 
expression in media containing glucose. For convenience, the genes were 

20 obtained on an approximately 4.0 kb DNA fragment obtained by digestion of 

plasmid pAH24 with the KpnJ/SacI restriction endonucleases, whereby the genes 
are already arranged in an expression cassette under the control of the E. coli lac 
promoter. This DNA fragment was ligated to similarly digested pBC-KS+ 
(Stratagene. LaJolla, CA) and used to transform E. coli DH5ot. Transformants 

25 were selected by their resistance to chloramphenicol (25 ug/mL) and were 

screened for a white colony phenotype on LB agar containing X-gal. Restriction 
analysis of the plasmid DNA from chloramphenicol-resistant transformants 
demonstrating the white colony phenotype confirmed the presence of the expected 
plasmid, designated pJSP2, in which the dhaT-dhaB(l-3) genes were subcloned 

30 under the control of the E. coli lac promoter. 

To enhance the conversion of glucose to 3G, this plasmid was separately 
transformed by electroporation into A', pneumoniae (ATCC 25955) (pJSPl A), 
K. pneumoniae (ECL2106) (pJSPl A) and A. oxytoca (ATCC 8724) (pJSPl A) 
already containing the ?\^ r darl-gpp2 expression cassette. Cotransformants were 

35 selected by their resistance to both tetracycline (10 ug/mL'i and chloramphenicol 
(25 ug/mL). Restriction analysis of the plasmids isolated from independent 
cotransformants showed the digestion patterns expected for both pJSPl A and 



40 




WO 98/21339 PCT/US97/20292 

pJSP2. ITie expression of the DARL GPP2, dhaBfI-3), and dhuT genes may be 
enhanced by the addition of IPTG (0.2-2.0 mM) to the medium. 

EXAMPLE 7 

Production of 1,3 propanediol from glucose bv K. pneumoniae 
5 Klebsiella pneumoniae strains ECL 2106 and 2106-47, both transformed 

with pJSPlA, and ATCC 25955, transformed with pJSPl A and pJSP2, were 
grown in a 5 L Applikon fermenter under various conditions (see Table 4) for the 
production of 1,3-propanediol from glucose. Strain 2104-47 is a fluoroacetate- 
tolerant derivative of ECL 2106 which was obtained from a fluoroacetate/lactatc 

10 selection plate as described in Bauer et al., Appl. Environ. Microbiol. 56 \ 1296 
(1990). In each case, the medium used contained 50-100 mM potassium 
phosphate buffer, pll 7.5, 40 mM (NH 4 ) 2 S0 4 , 0.1% (w/v) yeast extract, 10 /xM 
CoCl 2 , 6.5 /xM CuCl 2 , 100 /xM FeCl 3 , 18 FeS0 4 , 5 jxM H 3 B0 3 , 50 fiM MnCl 2 , 
0.1 /iMNa 2 Mo0 4 , 25 ZnCl 2 , 0.82 mM MgS0 4 , 0.9 mM CaCl 2 , and 10-20 g/L 

15 glucose. Additional glucose was fed, with residual glucose maintained in excess. 
Temperature was controlled at 37 °C and pH controlled at 7.5 with 5N KOH or 
NaOH. Appropriate antibiotics were included for plasmid maintenance; IPTG 
(isopropyl-P-D-thiogalactopyranosidc) was added at the indicated concentrations 
as well. For anaerobic fermentations, 0.1 wm nitrogen was sparged through the 

20 reactor; when the dO setpoint was 5%, 1 wm air was sparged through the reactor 
and the medium was supplemented with vitamin B 12. Final concentrations and 
overall yields (g/g) are shown in Table 4. 

Table 4 

25 Production of 1 ,3 propanediol from glucose by K. pneumoniae 







IPTG, 


vitamin B12, 




Yield, 


Organism 


dO 


mM 


in g/L 


Titer, &'L 




2 5955 [pJSP 1 A/pJSP2] 


0 


05 


0 


8.1 


16% 


25955[pJSPlA/pJSP2] 


5% 


0.2 


0.5 


5.2 


4% 


2106[pJSPlA] 


0 


0 


0 


4.9 


17% 


2106[pJSPlA] 


5% 


0 


5 


6.5 


12% 


2106-47[pJSP1A] 


5% 


0.2 


0.5 


10.9 


12% 
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EXAMPLE 8 

Conversion of carbon substrates to K3 -propanediol by recombinant 
K. pneumoniae containing darU zppl* dhaB, and dhal 
A. Conversion of D-fructose to 1 ,3-propanediol by various K pneumoniae 
5 recombinant strains: 

Single colonies of A', pneumoniae (ATCC 25955 pJSPl A), K. pneumoniae 
(ATCC 25955 pJSPlA/pJSP2), K. pneumoniae (ATCC 2106 pJSPl A), and 
A', pneumoniae (ATCC 2106 pJSPlA/pJSP2) were transferred from agar plates 
and in separate culture tubes were subcultured overnight in Luria-Bertani (LB) 

10 broth containing the appropriate antibiotic agent(s). A 50-mL flask containing 

45 mL of a steri-filtered minimal medium defined as LLMM/F which contains per 
liter: 10 g fructose; 1 g yeast extract; 50 mrnoles potassium phosphate, pH 7.5; 
40 mrnoles (NH^SC^; 0.09 mmoles calcium chloride; 2.38 mg CoCl 2 # 6H 2 0; 
0.88 mg CuCl 2 «2H 2 0; 27 mg FeCl 3 *6H 2 0; 5 mg FeSO 4 «7H 2 0; 0.31 mg H 3 B0 3 ; 

15 10 mg MnCl 2 »4H 2 0; 0.023 mg Na 2 MoO 4 *2H 2 0; 3.4 mg ZnCl 2 : 0.2 g 

MgS04'7H 2 (). Tetracycline at 10 ug/mL was added to medium for reactions 
using either of the single plasmid recombinants; 10 ug/mL tetracycline and 
25 ug/mL chloramphenicol for reactions using either of the double plasmid 
recombinants. The medium was thoroughly sparged with nitrogen prior to 

20 inoculation with 2 mL of the subculture. IPTG (I) at final concentration of 
0.5 mM was added to some flasks. The flasks were capped, then incubated at 
37 °C, 100 rpm in a New Brunswick Series 25 incubator/shaker. Reactions were 
run for at least 24 hours or until most of the carbon substrate was converted into 
products. Samples were analyzed by HPLC. Table 5 describes the yields of 

25 1 ,3-propanediol produced from fructose by the various Klebsiella recombinants. 

Table 5 



Production of 1,3-propanediol from D-fructose usuig recombinant Klebsiella 



Klebsiella Strain 


Medium 


Conversion 


[3G] 
(e/L) 


Yield Carbon (%) 


2106 pBR329 


LLMM/F 


100 


0 


0 


2106 pJSPlA 


LLMM/F 


50 


0.66 


15.5 


2106 pJSPlA 


LLMMT + I 


100 


0.11 


1.4 


2106 pJSPl A/pJSP2 


LLMM/F 


58 


0.26 


5 


25955 pBR329 


LLMM/F 


100 


0 


0 


25955 pJSPlA 


LLMMT 


100 


0.3 


4 


25955 pJSPlA 


LLMMT * I 


100 


0.15 


i 


25955 P JSPlA/pJSP2 


LLMMT 


100 


0.9 


: i 


25955 pJSPlA.'pJSP2 


LLMM/F + 1 


62 


1.0 


20 
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B. Conversion of various carbon substrates to 1 ,3-propanediol by K. pneumoniae 
(ATCC 25955 pJSPl A/pJSP2): 

An aliquot (0.1 mL) of frozen stock cultures of K. pneumoniae 
5 (ATCC 25955 pJSPl A7pJSP2) was transferred to 50 mL Seed medium in a 

250 mL baffled flask. The Seed medium contained per liter: 0. 1 molar NaK/P0 4 
buffer, pH 7.0; 3 g (NH 4 ) 2 S0 4 ; 5 g glucose, 0.15 g MgSO 4 -7H 2 0, 10 mL 100X 
Trace Element solution, 25 mg chloramphenicol, 10 mg tetracycline, and 1 g yeast 
extract. The 100X Trace Element contained per liter: 10 g citric acid, 1.5 g 

10 CaCl 2 '2Il 2 0, 2.8 g FeSO 4 *7H 2 0, 0.39 g ZnSO 4 -7H 2 0, 0.38 g CuSO 4 «5H 2 0, 0.2 g 
CoCl 2 -6H 2 0, and 0.3 g MnCl 2 *4H 2 0. The resulting solution was titrated to 
pH 7.0 with either KOH or H 2 S0 4 . The glucose, trace elements, antibiotics and 
yeast extracts were sterilized separately. The seed inoculum was grown overnight 
at 35 °C and250rpm. 

15 The reaction design was semi-aerobic. The system consisted of 1 30 mL 

Reaction medium in 125 mL sealed flasks that were left partially open with 
aluminum foil strip. The Reaction Medium contained per liter: 3 g (NH 4 ) 2 S0 4 , 
20 g carbon substrate; 0.15 molar NaK/P0 4 buffer, pH 7.5; 1 g yeast extract; 
0.15 gMgSO 4 -7H 2 0; 0.5 mmoles IPTG; 10 mL 100X Trace Element solution: 

20 25 mg chloramphenicol; and 10 mg tetracycline. The resulting solution was 
titrated to pH 7.5 with KOH or H 2 S0 4 . The carbon sources were: D-glucose 
(Glc); D-fructose (Frc); D-lactose (Lac); D-sucrose (Sue); D-maltose (Mai); and 
D-mannitol (Man). A few glass beads were included in the medium to improve 
mixing. The reactions were initiated by addition of seed inoculum so that the 

25 optical density of the cell suspension started at 0. 1 AU as measured at X 600 nm. 
The flasks were incubated at 35 °C: 250 rpm. 3G production was measured by 
HPLC after 24 hr. Table 6 describes the yields of 1 ,3-propanediol produced from 
the various carbon substrates. 

30 Table 6 



Production of 1,3-propanediol from various carbon substrates 
using recombinant Klebsiella 25955 pJSPl A/pJSP2 





1 ,3-Propanediol (g/L) 


Carbon Substrate 


Expt. 1 


Expt. 2 


Expt 3 


Glc 


0.89 


1 


1.6 


Frc 


0.19 


0.23 


0.24 


Lac 


0.15 


0.58 


0.56 


Sue 


0.88 


0.62 




Mai 


0.05 


0.03 


0.02 


Man 


0.03 


0.05 


0.04 
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SEQUENCE LISTING 
(1; GENERAL INFORMATION: 

;i) APPLICANT: 

(A) ADDRESSEE: E. I. CU FONT CE NEMOURS AND COMPANY 

(B) 37 BEET : 1C07 MARKET STREET 
(Cl CITY: WILMINGTON 

(D) STATE: DELAWARE 

(E) COUNTRY: U.S.A. 

( F ) ZIP: 19891; 

(G) TELEPHONE: 302-392-6112 

(H) TELEFAX : 302-" 7 7 3-0 1 64 

(I) TELEX: 6717325 

(A) ADDRESSEE: GENENCOR INTERNATIONAL, INC. 

(E) STREET: 4 CAMBRIDGE PLACE 

187 0 SOUTH WINTON ROAD 

(C) CITY: ROCHESTER 
;D) STATE : NEW tORK 
;e) COUNTRY: U.S.A. 

[?) POSTAL CODE (ZIP) : 14618 

(ii) TITLE OF INVENTION: METHOD FOR THE RECOMBINANT 

PRODUCTION OF 1 , 3 -PROPANEDIOL 

■iii) NUMBER OF SEQUENCES: 49 

(iv) COMPUTER ? EADABLE FORM: 

(A) MEDIUM TYPE: 3.50 INCH DISKETTE 

(B) COMPUTER: IBM PC C0MPATI3LE 

(CJ OPERATING SYSTEM: MICROSOFT WORD FOR WINDOWS 95 
(D: SOFTWARE: MICROSOFT WORD VERSION 7 . OA 

;v) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 
(Bj FILING DATE: 

(C) CLASSIFICATION: 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/030, 6C1 

(B) FILING DATE: NOVEMBER 13, 1996 

(vii s ATTORNEY /AGENT INFORMATION: 

(A) NAME: FLO YE , LINDA AXAMETKY 

;B) REGISTRATION NC . : 33,692 

;C) REFERENCE/DOCKET NUMBER: CR-9982 
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Val Gin Ser Ala Gin 'Jlu lys lie Gly G_n Vai Vai Giu Gly Tyr Arg 
2"? 5 280 2 3 5 

Asn Thr Lys Giu Vai Arg Glu Leu Ala His Arg Phe Gly Val. Glu Met 
2 90 * 2 9b 300 

Pro lie Thr Glu Glu He Tyr Gin Val Leu Tyr Cys Gly Lys Asr. Ala 
305 310 ^ 315 3^0 

Aru Glu Ala Ala Leu Thr Leu Leu Glv Arg Aid Arg Lys Asp Glu Arc 
325 330 335 

Ser Ser His 

■2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 501 amino acids 
(3) TYPE: amine acid 

(C) STRANCEDNSSS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vii ORIGINAL SOURCE : 

(A) ORGANISM: GLPD 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 15: 

Met Glu Thr Lys Asp Leu He Val He Glv Gly Gly He Asn Gly Ala 
1 5 ' 10 15 

Glv He Ala Ala Aso Ala Ala Gly Arg Glv Leu Ser Val Leu Met Leu 
20 ' 25 30 

Giu Ala Gin Asp Leu Ala Cys Ala Thr Ser Ser Ala Ser Ser Lys Lou 
3 5 40 4 5 

He His Gly Gly Leu Arc Tyr Leu Glu His Tyr Glu Phe Arg Leu Val 
50 " ' ' 55 60 

Ser Giu Ala Leu Ala Glu Arg Git: Val Lei: Leu Lys Met Ala Pro His 

6 5 7 0 7; 9 0 

He Ala Phe Pro Met Arc Fne Arq Leu Pro His Arc Pro His Leu Arc 

8 5 90 S5 

Pre Ala Trp Met He Arg lie Gly Leu Phe Met Tyr Asp His Leu Gly 
1C0 105 110 

Lvs Arg Thr Ser Leu Pro Glv Ser Thr Gly Leu Arc ?ne Gly Ala Asn 
115 120 125 

Ser Val Leu Lys Pro Glu lie Lys Arg Gly Phe Giu Tyr Ser Asp Cys 
130 ' 135 14 0 

Trp Val Asp Asc AH A: J Leu Val Leu Ala Asn Ala Gin Met Val Val 
143 150 155 H3 

Arc Lys Gly Glv Giu Val Leu Thr Am Thr Arc Ala Thr Ser Ala Arc 

1 6 5 I <"0 H 5 

Arn Glu Asn Glv Leu Trc He VH Glu Ala Glu Asp Ho Asp Thr Gly 

■ec ib c :?o 
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Lys Lys Tyr Ser Tro Gin Ala Arg Sly Leu Val Asn Ala Thr Gly Pro 

155 ^00 205 

Trp Val Lys Gin Fhe P:ie Asp Asp Gly Me t_ His Leu Pro Sex Pro Tyr 
210 215 220 

Gly He Arg Leu He l.ys Gly Ser His He Val Val Pro Arg Val His 
225 230 235 240 

Thr Gin Lys Gin Ala Tyr Lie Leu Gin Asn Glu Asp Lys Arg lie Val 
245 250 255 

Fhe Val He Pro Tro Met Asp Glu Fhe Ser He He Gly Thr Thr Asp 
26C 265 270 

Val Glu Tyr Lvs Gly Asp Pro Lvs Ala Val Lys He Glu Glu Ser Glu 
275 J 280 285 

lie Asn Tyr Leu Leu Asn Val Tyr Asn Thr His Phe Lys Lys Gin Leu 
2 90 ' 2 95 300 

Ser Arg Asp Asp He Val Trp Thr Tyr Ser Gly Val Arg Pro Leu Cys 
305 " ~ 310 ' 315 32C 

Asp Asp Glu Ser Asn Ser Pre Hn Ala He Thr Arg Asp Tyr Tnr Leu 
325 330 335 

Asp He His Aso Glu Asn Gly Lys Ala Pro Leu Leu Ser Val Phe Gly 
340 345 350 

Gly Lys Leu Thr Thr Tyr Arg Lys Leu Ala Glu His Ala Lou Glu Lys 
355 ' 360 365 

Leu Thr Pro Tyr Tyr Gin Gly lie Gly Pro Ala Trp Thr Lys Glu Ser 
370 " ' 375 380 

Val Leu Pro Gly Gly Ala He Glu G~y Asp Ara Asp Asp Tyr Ala Ala 
385 390 395 400 

Ara Leu Ara Arg Arc Tyr Pro Phe Leu Thr Glu Ser Leu Ala Arq His 
405 410 415 

Tvr Ala Ara Thr Tvr Glv Ser Asn Sor Glu Leu Lou Leu Gly Asn All 
4 20 ' 425 4 30 

Gly Tnr Val Sei Asp Leu Gly Glu Asp Phe Gly His Glu Phe Tvr Glu 
4 3 5 4 4 0 4 4 5 

Ala Glu Leu Lys Tyr Leu Val Asp His Glu Trp Vaj Arg Arg Ala Asp 
450 " 455 460 

Aso Ala Leu Trp Arg Arg Thr Lys Gin Gly Met Trp Leu Asn Ala Asp 
465 * 470 475 430 

Gin Gin Ser Arc Val Ser Gin Trp Leu Val Glu Tyr Thr Gin Gin Arg 
4 55 4 90 4 95 

Leu Ser Leu Ai.a Ser 
50 J 

;2; INFORMATION FOR SEC 1 I J NO: 16: 

i ) SEC'JENCE CHARACTER I ST ITS : 

i A; LENGTH: 5 42 air.inc aci^s 
■ E ; TYPE: amine acid 
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(C) STRANDF.DNESS : unknown 
CD) TOPOLOGY : unknown 

MOLECULE TYPE: protein 

; vi ; ORIGINAL SOURCE : 

(k) ORGANISM: GLPABC 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Lys Thr Arc Aso Ser Gin Ser Ser Asp Val lie lie lie Giy Glv 
1 5 10 1 5 

Gly Ala Thr Gly Ala Giy lie Ala Arg Asp Cys Ala Leu Arg Gly Leu 

20 25 30 

Arg Val lie Leu Val Glu Arg His Asp lie Ala Thr Gly Ala Thr Gly 
35 40 4 C 

Arg Asn His Gly Leu Leu His Ser Gly Ala Arg Tyr Ala Val Thr Asp 
50 ' 55 60 

Ala G-u Ser Ala Arg Glu Cys lie Ser Glu Asn Gin lie Leu Lys Arc 

6 5 7 0 7 5 5 0 

lie Ala Arg His Cys Val Glu Pro Thr Asn Gly Leu Fhe He Thr Leu 
GS 90 9o 

?rc Glu Asp Asd Leu Ser ?he Gin Ala Thr ?he He Arg Ala Cys Glu 

:cb 105 no 

Glu Ala Gly He Ser Ala GxU Ala lie Asn ?rc Gin Gin Ala Arq He 
115 120 125 

Tie Glu Pro Ala Val Asn Pro Ala Leu Tie Gly Ala Val Lys Val Pro 
130 lib 140 

Asp Gly Thr Val Asp Fro Phe Arg Leu Thr Ala Ala Asn Met Leu Asp 
145 ' 150 155 160 

A. a Lys Glu His Giy Ala Val He Leu Thr Ala His G.u Val Thr Gly 
165 170 175 

Leu He Arg Glu Giy Ala Thr Val Cys Gly Val Arg Val Arg Asr. His 
130 18 5 l'jO 

Leu Thr Gly Glu Thr Gin Ala Leu His Ala Pro Val Val Val Asr. Ala 

195 200 205 

Aid Gly He Tip Giy Gin His He Ala Glu Tyr Ala Asp Leu Aru He 
210 * ' 215 220 

Arg Met Phe Pro Ala Lvs Gly Ser Leu Leu He Met Asp His Arq He 

225 230 225 240 

Asn G.lr. His Val Tie Asn Arq Cys Arg Lvs Fro Ser Aso Ala Asn He 
245 250 255 

Leu Val Pre Gly Asp Thr He Ser Leu He Giy Thi Thr Ser Leu Arg 
?_f?. 265 270 

lie Asp Tyi Asn Giu He Asp Aso Asn Arrj Val inr Ala Hu Glu Val 



Asr He ;.eu Leu Arq Glu SI v Siu Lys Leu Ala Pro Val Met Ala Lvs 

2 - - * 2 95 3 2 ■ 
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Thr Arg lie Leu Arc Ala Tyr Ser Giy Val Arg Pro Leu Val Ala Ser 

3C5 ' ' 310 315 32C 

Aso Asp Asu Fro Ser Glv Aid Asn Leu Ser Arg Giy lie Val Leu Leu 
32 5 330 " 3 35 

As z- His Aia Giu Arc Asp G_y Asp Sly Phe He Thr He Thr Giy 

340 345 350 

Glv L-ys Leu Met Thr 'Lyr Arc Leu Met Ala Glu Trp A- a Thr Asp Ala 
355 360 365 

Val Cys Arg Lys Leu Giy Asn Thr Arg Pro Cys Thr Thr Ala Asp Leu 
370 ^ " 31=) 3G0 

Ala Leu Fro Giy Ser Gin Glu Pro Ala Glu Val Thr Leu Arg Lys Val 
385 ' 390 395 " ' 400 

He Ser Leu Pre Ala Pro Leu Arq Giy Ser A_a Val Tyr Arq His Giy 
4G5 410 415 

A 3D Arc Thr Pro Ala Trp Leu Ser Glu Giy Arc Leu His Arq Ser Lev. 
4 2C 4 25 4 30 

Val Cys Glu Cys Glu Ala Val Thr Aia Giy Glu Val Gin Tyr Ala Val 
435 ~ 440 445 

Glu Asn Leu Asn Val Asn Ser Leu Leu Asp Leu Arg Arg Arg Thr Arc 
4 50 4 55 4 60 

Val Giy Met Giy Thr Cys Gin Giy Glu Leu Cys Ala Cys Arc Ala Ala 
4 6b " 4VC 4 75 4 8C 

Giy Leu Lou Gin Arc Phe Asn Val Thr Thr Ser Ala Gin Sor Ho Glu 
435 4 9C 4 35 

Gin Leu Ser Thr Phe Leu Asn Glu Arq Trp Lys G_y Va. Gir. Pro lie 
SCO 505 510 

Ala Tro Giy Aso Ala Leu Arg Glu Ser Giu Phe Thr Arg Trp Val Tyr 

515 * 520 525 

Gin Giy Leu Cys Giy Leu Glu Lys Glu Gin Lys Asp Ala Leu 
530 535 54 0 

(2) INFORMATION FOR SEQ 1 0 NO: 17: 

(iG SEQUENCE CHARACTERISTICS: 

;A) LENGTH: 2 50 air.inc acids 
;B) TYPE: am no acid 
[CI STRAHCEDNESS: unknown 
ID) TOPOLOGY: unknown 

;ii: MOLECULE TYPE: protein 

,vi- ORIGINAL SOURCE: 

(A) ORGANISM: GPF; 

,.xi SEQUENCE DESCRIPTION: SEC IC NO : 1 7 : 

Me: Glv Leu Thr Thr Lys [rr Leu Ser Leu Lys Vil Asn .-la A \ j Lou 



rhe Asp Va. Asp -j-.y :!:r .:e -^e - er o-.n rrc .-.^d ^ - tr ma 
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Phe Trp Aro Asp Phe Gly Lys Asp Lys Pro ?yr ?he Asp Ala Glu His 
35 40* 41 

Val lie 31 n Veil Ser His Gly Trp Arg Thr Phe Asp Ala lie Ala lys 

5 3 5. c 60 

Phe Ala Pro Asp Phe Aid Asn Glu Giu Tyr Va Asn Lys Leu Glu Ala 
65 "7 0 "7 5 gj 

31 j lie Pro Val Lys Tyr Gly Glu Lys Ger lie Glu Val Pre Gly Ala 
8b ' 90 95 

Val Lys Leu Cys Asn Ala Leu Asn Ala Leu Pro Lys Glu Lys Tro A_a 
100 105 110 

Val Ala Thr Ser Sly Thr Arg Asp Met Ale Gin Lys Cro Phe Glu His 
115 120 125 

Leu Gly lie Arg Arg Pro Lys Tyr Fhe lie Thr Ala Asn Asc Val Lys 
130 135 140 

Gin Gly Lys Pro His Pro Glu Pro Tyr Leu Lys Gly Arg Asn Gly Leu 
145 " " 150 " 155 " 160 

Gly Tyr Fro lie Asn Glu Gin Asp Pro Ser Lys Ser Lys Val Val Val 
165 " 170 175 

Fhe Glu Asp Ala Pro Ala Gly lie Ala Ala Glv Lvs Ala Ala Gly Cvs 
180 135 190 

Lys T.'e lie Gly lie Ala Tnr Thr ?ne Asp Leu Asp Phe Leu Lyb Glu 

195 200 205 

Lys Gly Cys Asp lie lie Val Lys Asn His Glu Ser lie Arc Val Gly 

?.: 0 2 15 220 

Glv Tvr Asn Ala Glu Thr Asp Glu Val Glu Phe lit; Fhe Asp Asn Tyr 
225 233 235 240 

Leu Tvr Ala Lvs A.sp Aso Leu Leu l^ys Tro 
245 250 

(2] INFORMATION FOR 5E2 13 NO: 18: 

i.i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 709 an.inc aulas 

(B) TYPE: amine acid 

(C) ST HANDEDNESS : unknown 
iC) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

i A) O RGANISM: GUT I 

;xi] SEQUENCE DESCRIPTION : SEC IS KO:l?: 

Met. Fhe Pre Ser Leu Fhe Aru Leu Val Va* Phe Ser Lys Ai u Ty: lie 
1 5 " 10 15 

T'hc. Arg ;>r Ser G'.n Arg Leu Tyr Tor , c er Leu lys Glr. Giu Gin Ser 



Arc Met. Ser Lys He Met Glu Asp Leu Arg Ser Asp Tyr Val Pre 
3 5 AC 4 5 
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lie A- a Ser He Asp Vai Giy Thr Thr Sex Ser Arg Cys lie Lou Phe 
50 ' 55 60 

Asn Ara Trp 3_y Gin Asp Vai Ser Lys His Gin lie Giu Tyr Ser Thr 

6 5 7 0 ^5 BO 

Ser Air. Ser Lys Gly Lys He Gly Vai Ser Giy Leu Arg Arc Fro Ser 
tfb* y0 9 5 

Thr Ala Pre Ala Arg Glu Thr Pro Asn Ala Gly Asp Ho Lys Thr Ser 
100 1C5 11C 

Gly Lys Pro He Phe Ser Ala Glu Gly Tyr Ala He Gin Glu Thr Lys 
115 120 125 

Phe Leu Lys He Glu Glu Leu Asp Leu Asp Phe His Asn Glu Fro Thr 
130 135 140 

Leu Lys Phe Pro Lys Pro Gly Trp Vai Glu Cys His Pro Gin Lys Leu 
145 ' 150 ' * 155 160 

Leu Vai Asn Vai Vai Gin Cys Leu Ala Ser Ser Leu Leu Ser Leu Gin 
165 ' 170 175 

Thr He Asn Ser Giu Arg Vai Ala Asn Gly Leu Pro Pic Tyr Lys Vai 
180 135 190 

He Cys Me: Gly He Ala Asn Met Arg Glu Thr Thr lie Leu Trp Ser 

19b 200 205 

Arc Ara Thr Glv Lys Pre lie Vai Asn Tyr Gly He Vai Trp Asn Asp 

210 215 220 

Thr Arg Thr He Lys He Vai Arg Asp Lys Trp Gin Asn Thr Ser Vai 
225 * 230 235 240 

Asp Arg Gin Leu Gin Leu Ara Gin Lys Thr G'y leu Pro Leu Leu Ser 
245 250 255 

Thr Tyr Phe Ser Cys Ser Lys Leu Arg Trp Phe Leu Asp Asn Giu Pro 
260 " 265 ' 270 

Leu Cys Thr Lys Ala Tvr Glu Hu Asn Asp Leu Met Phe Giy Thr Vai 
275 ' 230 235 

Aso Thr Trp Leu He Tyr Gin Leu Thr Lys Gin Lys Ala Pne Vai Ser 

29C 255 300 

Asp Vai Thr Asn Ala Ser Arg Thr Gly Phe Met Asn Leu Ser Thr Leu 
305 OH 31 5 520 

Lys Tvr Asp Asn Glu Leu Leu Glu Phe Trp Gly lie Asp Lys Asn Leu 

325 330 335 

He Hia Met Pio Giu He Vai Ser Ser Ser Gin Tvr Tvr Giy Aso Phe 
34 J 345 550 

Gly He Pre Aso Trp 7 ; e Ker Hu lys Leu H ; n Asp Sr-r Prr. lys Tnr 
5 5 5 3 6 0 3 •: 5 

VH Leu Arg Aso Leu VH Lys Asn Le„ ric Hr Glu Sly Cys Leu 

* i -\ ~ - c 5 8: 



Glv Aso Glu Ser Ala Ser Met Vai Glv Sin Leu Ala Tyr Lv« 

585 ' 3.0 795 
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Ala Ala Lys Cys Thr Tyr Gly Thr Giy Cys Phe Leu Leu Tyr Asr. Thr 
4 05 ' ' 4 1C 4 15 

Giy Tnr Lys Lys Leu lie 3er Gin His Gly A_a Leu Thr Thr Leu Ala 



Phe Trp Phe Pro His leu Sir. Glu Tyr Gly Gly Gin Lys Pro G^u Leu 
4 35 4 4 0 445 

Ser Lys Pro His Phe Ala Leu Glu Gly Ser Val Ala Val Ala Giy Ala 
450 455 460 

Val Val Gin Trp Leu Arg As? Asn Leu Arg Leu lie Asp Lys Ser Glu 
4 65 " 4 70 475 4 60 

Asd Val Gly Pre He Ala Ser Thr Val Pre Asp Ser Gly Glv Val Val 
485 490 495 

Phe Val Pro Ala Phe Ser Gly Leu Phe Ala Pro Tyr Trp Asp Pro Asp 
50C ' 5C5 510 

Ala Arc Ala Thr He Met Gly Met Ser Gin Phe Thr Thr Ala Ser His 

515 52C 525 

lie Ala Arg A. a Ala Val Glu Gly Val Cys Phe Gin Ala Arg Ala lie 

530 535 54 0 

Leu Lys Ala Met Ser Ser Asp Ala Phe Gly Glu Gly Ser Lys Asp Arg 
545 " 550 ~ 555 560 

Asp Phe Leu Glu Glu He Ser Asp Val Thr Tyr Glu Lys Ser Pro Leu 

565 570 575 

Ser Val Leu Ala Val Asp Gly Gly Met Ser Arg Ser Asn Glu Val Met 
580 535 590 

Gin He Gin Ala Asp lie .eu Gly Pro Cys Val Lys Val Arc Arg Ser 
595 600 605 

Pro Thr Ala Glu Cys Thr Ala Leu Gly Ala Ala lie Ala Ala Asn Me~ 

610 615 62 0 

Ala Phe Lvs Asp Val Asn Glu Arg Pre Leu Trp Lys Asp Leu His Asp 
625 " 630 635 64 0 

Val Lys Lvs Trp Val Phe Tvr Asn Gly Met Glu Lys Asn Glu Gin lie 
64 5 650 655 

Ser Pro Glu Ala [lis Pro Asn Leu Lys He Phe Arg Ser Glu Ser- Asp 

660 665 670 

Asp Ala Glu Arg Arq Lys His Trp Lys Tyr Trp Glu Val Ala Val Glu 

675 6B0 685 

Arg Ser Lys Gly Trp Leu Lys Asp He Glu Gly Glu His Glu G_n Va. 

6 90 69 5 7 00 

Leu G i v. Asn Phe G 1 n 



I M FORMAT I ON FOR SEC IP NO : 1 9 : 

ii) SEC'l'ENGE CHARACTERISTICS: 

i A 1 LENGTH: 12145 base pairs 
IB- TYPE: r.uciei: azid 



420 



4 30 
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;:) STRANDSDKSSS: snq:e 
■(D) TOPOLOGY: linear 

MOLECULE: TYFE: DNA (qenomic) 

:vi) ORIGINAL SOURCE: 
(A) ORGANISM: 

;>:ii SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



AAAATTCA3G ATGTCGCCGG TATAGTT7TT GATAATCAGC AAGACGCCTT CGCCGCCGTC 12 0 

AAITTG:A?C GCGCA7TCAA ACATTTTOTC CGGCGTCGGO GAGGTGAATA T7TCCCCCGG 18 0 

ACAGGC3CCG GAGAGCATGC CCTGGCCGAT A7AGCCGCAG TGCA7CGGTT CATG7CCGC7 240 

gc:gcc-70-:g gagagcaggg CCACCTTGCC AGCCACCGGG GCGTCG3TGC GGGTCA7ATA 300 

CA3CGGGTCC TGATGCAGGG TCAGCTGCGG A7GGGCTTTA GCCAGCCCC7 GTAATTGTTC 360 

AT7CAGTACA 7CTTCAACAC GGTTAATCAG 0TTTTTCAT7 ATTCAG7GC7 GCGTTGGA3A 420 

AG GTTCGATG CCGCCTC7CT GCTGGCGGAG 3CGGTCATCG C3TAGGGGTA TCGTCTGACG 4 80 

GTGGAGCGTG CCTGGCGATA TGATGA7T 2T GGC7GAGCGG ACGAAAAAAA 3AA7GCCCCG 54 0 
ACGATC3GGT TTCA77ACGA AACA7TGCTT 3C7GATTTTG 7TTCTTTATG GAACGTTTTT GOO 
GCTGAGGATA 73GTG/AAA7 3CGAGC7G3C GCGCTTTTTT 7CT7CTGCCA 7AAG 3GGC3G 6 60 
TCAGGATA3C CGGCGAAGCG GGTGGGAAAA AATTTTTTGC 7GA7TTTCTG CCGACTGCGG 720 
GAGAAAA3G3 G3TCAAACAC GGAGGAT7GT AAGGGCATTA TGCGGCAAAG GAGOGGATOG 780 
G GATCGCAAT OCT G AC AG AG ACTAGGG77T TTTGTTCCAA TATGGAAOGT AAAAAAT TAA 84 0 
CCTGTGTTTO A7ATCAGAAC AAAAAGGCGA AAGAT7TT7T T-37TCCC7GC CG3CCCTACA 900 
3TGATC3CAC :GC7CCGGTA CGGTCCG7TG AGGCCGCGCT TCACTGG 2CG GC 3CGGATAA 960 

0 GCC AGS GC7 OATCATGTCT ACATGCGCAC 77ATT7GAGG GTGAAAG 3AA TGCTAAAAGT 13 2 0 

rATTCAATcr o:ag:oaaat atottcaggg toctgatgct g:tgttc7Gt tcggtcaata :r^c 

TGCCAAAAA3 7TGGCGGAGA GCTTCTTCGT CATC3CTGAC GATTTCG7AA T3AAGCTGGC 1140 

GGGAGAGAAA GTGGTGAATG GCCT3CAGA3 CO AC GAT ATT C.GCTGCCAT3 CGGAACGGT7 120C 

TAACG3CGAA 7GC.AGCCATG CGGAAATCAA COGTOTGATG G0GA7TTTG0 AAAAA0AGG3 12 60 

CTGCCGCGGC GTGGTCGGGA TCGGCGG7G3 TAAAACCCTC GATACCGCGA AGGCGATCGG 132 0 

TTACTA30A3 AAGCTGCCGG T3GTG3TGA7 CCC3ACCA7C GCCTCGACCG A7GCGCCAAC 1380 

C A 30 GOG 0 70 TCGG7GA7'"T AOAC 3GAAGC 3GG0GAGTT7 3AA3AGTATC T 3ATCTATCC 1 4 4 J 

GAAAAAOOCO 3A7A7GG7 3G T GAT 3 G AC AO 3 3CGAT7ATC JO C AAA GO 30 0 3GTAOGCC7 150? 

GOTGG70TCO 3GCA7GGG23 A7003070TI 3AOOTGGTTC GAGGCOAAAG OTTOOTAOOA 1560 

03OG0GO30: AOOAG OAT 3 3 OOGGAGGAOA 31' OOAOCGAG GOGGOOOTGA 000700000 3 162 j 

007GTGC7AT GATACGCTG 3 TOGOO3A0O0 OOO^VAGGOO 00 TO TOGO 03 O0OAG3CO33 16^0 



gtoga:oa;c 



ACGGTGGTGA CT7TAATGCC GCTCTCA7GC A3CAGCTCGG T3GC3GTCTC 
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G3TAGTGACC GAAGCGCTGG AGC3CATCA7 

CTTTGAAAGC AGTGGCCTGG 7CGCTGCCCA 

AGAGTGCCAT CACCT3TATC ACGGTGAGAA 

GCTGCA3AAC AGCCC3ATGG ACGAGATT3A 

CCT3CCGGTG ACGCTCGC3C AGA7GGGCG? 

GGTGGCGAAA GCTACCTGCG CGGAAGGGGA 

cccggaga3c gt7catgccg ctatcctcac 
gcg7taat7c gc3gtggcta aaccgctggc 
3 3cagtcgct gccggagggg ttctc7atgg 
ac tc ag gat a ccgggaaggc ggtctcttcc 
aagtttatg: agcgcgaaac ctggcaaacg 

7CCATCTGTC GGCGTAAAAC CGCGCTGCTC 
7G3GAGTTTA TGCA7CGCCG CCCCTGCGCG 
CT3AGCC3TT GCGG7GAGCC GCAAACCCTG 
GGCA3CTATT GTCCGGAGAG CATTATCGGC 
GGCCAGC 2GA TCAACACCGO CGGCGATCG3 
TTTT3CTCGA 3GCCGGTGTT TGATAACCAO 
TGTCTGGTC j AGO AC C AG T 2 :AGCGC"GA: 
GTGGGTAACT CCCTGC7TAC CGACAG7CTG 
ATGTACGGCC TGCTGGAGAG 2ATGGACGAT 
2TGCAGTTTC TCAATG7TCA GGCGGCGAGA 
■3GGAAAAATA TC3CC3AT3T 3GTGACCCTC 
GCCCGCSG CC TGAATCAC3T OGAAGTCAGC 
GT'GATCACCT 7AAAACCGAT 7GTCGAGGCG 
CCGGTGGAGC AGATGCGGCA GCTGA7GACC 
GAGCA3ATGT CTSCCGACGA TCCCGAAACC 
GCGCGGGGCG GC7TCCC3GT GCTACTGTGC 
AGCCAGGC7A TTCAGAATCA AAGCGAA7G 3 
CA3CTATATG CCGACAGCGT GC7G3GCCA3 

gaaaatggtc ggc7gagccg cct7gagct3 
atcgagtat: tggcgcc3ga gctgcagtcg 

C7CACCCGC2 TG3ACGC2CG GCGGCTGATC 
P.CCG7 23ATC CGGGCAATC7 3GT3GAACAG 
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7GAGGCGAAC ACTTACCTCA GC3GCATTGG 174 C 

TGCAATCCAC AAC3GTTTCA CCATTCTT 3A 13 00 

AGTGGCCTTC GGTACCCTGG CGCAGCT3GT 13 60 

AACGGTGCAG GGC7TCTGCC AGCGCGTCGG 1 920 
CAAAGAGGGG ATCGA7GAGA AAATCGC7GC 

AACCATCCAT AATATGCCGT TTGC3GTGAC 204 0 

CGCCGATCTG 7TAGGGCAGC AGTGGCTGG7 2100 

CCAGGTCAGC GGTTTTTCTT TC7CC:C7CC 2160 

TA2AACGCGG AAAAGGATAT GACTGTTCA3 2220 

GTCATTGCCC AG7CATGGCA CCGCTGCAG2 22 8 3 

CCGCACCAGG CCCAGGGCCT GACC7TCGAC 2 34 0 

ACCATCGGCC A3GC3GCGCT GGAAGACGCC 24 CO 

CTGTTTATTC TTGATGAGTC CG3CTGCATC 34 60 

GCCCAGC7GG CTGCCCTGGG A7TTCGCGAC 2520 

ACCT3CGCGC TGTCGCTGGC OGCGATGCAG 2SiO 

CAIT7TAAGC AGGCGCTACA GCCATGGA3T 2 64 0 

3 3GCG 3 OTGT 7CGGCTCTA7 :TCG2T7T3: 27 30 

JTCTCCGTGA CGCTGGCCAT 3GCCCGCGA- 4 37oQ 

7TGGCG 3AAT CCAACCGTCA 2CTCAATCAG 2 820 

3GGGTGATGG CGTGGAACGA ACAGGGCGTG 2fi30 

CTGCTGCATC TTGATGCT7A GGCCA3CCAG 2 94 0 

CCGGCZCIGC 7GCGCCGCGC CATCAAACAC 300C 

7TTGAAAGTC AGCATCAG7T TGTCGAT3CG 3D 60 

CAAGGCAA7A GTTTTATTCT GC7GCT 3CAT 3 120 

AGCCA3CTCG GTAAAGT3AG C:ACACC7Tr 3180 

CGACGCCTGA TCCACTTTGG C2GCGA3GCG 324 0 

GGCGAAGAGG GGGT CGG3AA AGAGCTGCT 3 3300 

GZGGGGGGZC :CTACATCTC 23TCAAGTGC 3 3 6 0 

GACT7TA7~G CCAG2GC 2CZ TAG OCA CO AC 3420 

GCCAA 3GG 3 3 GCAC OCT 3TT TCT 3GAAAA 3 j 4 fc j 

OCT TT ~0T3C AGGTGATTAA GCA3GGCGT0 7 540 

CCG3T33AT3 rGAA^GPjAT TGC3ACCAC3 JcjO 

AACCCCTTTA 3CCG2CA3CT 3TACTA7GC" 3 660 
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ctgcactcc7 ttgagatcg7 catcccgccc ctgcg37ccc gacgcaacag tattccg7c 3 272c 

37g3t3 3ata accggttgaa 3ag3ctggag aa3cg7ttct ct7cgcgact gaaa3tgga3 37 6 c 

gatga:g:g: tggcacagct 3GTggcctac tc3TGGCcgg ggaatgactt t^ag :tgaa: j - 4 o 

agcgt jA.rr j agaatatcgc catcagcag3 gacaacggcc a:attcgcct gagtaatct 3 3>oo 

CCG3AATAT3 TCTTT77CGA GCGGCCGGGO GGGGATAGCG C37CATCGCT G3TG CCGGCC 3?-5C 

A3CCT 3 ACT I 7TAGCGCCAT CGAAAA 3GAA GCTAT7ATT3 ACGC CGCCCG G3TGACCAGC 4--'f^C 

33GCG33T3C AGGAGATGTG 3CAGCTGC7C AATATCGCCC 3 3ACCACCCT GTGGCGCAAA 4080 

ATGAA 3CA 3T ACGATAIT3A OGCCAJCCAG TTCAAGCGCA AjCATCAGGC GTAGTGTCTT 4 : 4 >: 

33A7TCGC3C CAT GG AG AAG AGGGCATCC3 ACAGGCGATT GGTGT AGCGT TTGA3CGCGT 4J0C 

3GCGCAGC3G ATGCG3GCGG TCCAT3GCCG TCAGCAGGCG TTCGAGGCGA C3GGACTGGG 4j6l 

TGCGC 3CCAC GTGGAGGTGG GGAGA3GGGA GATT 3CTCCC GGGGATGAC3 AACTGTT7TA 4 320 

ACGGGCCGCT CT7GG3CATA 7T 3CG 3TCGA TAAG CCGC IC CAGGGCGGT3 ATCTCCTCTT 4 33 0 

:3CCGATC3T G? 3 3C7CAGG CG3GT3AGG3 CCCG3GCAT3 G3TGGC3AGT T:AG:g:GCA 4-4 0 

3CACGAAGAG G3TGT3GTGA ATATG3TGCA GGCTTT77:G CAGCCC3GC3 TCGGGG 3TCG 4000 

TGGCGTAGCA GAC3GCGAGG TGGGATATCA GT73ATC3AC G3TGCGGTAG GCCTCGACGC 4 C .»00 

GAATATCGTC 7TTCTCGA7G CGGCTGC 3GC C 3TA CAGGGC GGTGGT3CCT T7ATCC 3CGG 4>'=20 

L'GCGGGTATA GATA 3 3 AT A 3 ATTCAGTTTC T:t:AGT7AA CGG'ZAGZAZZ T7AACCAGC7 408- 

3GGGG3GG7T G 3CGCCGAGC GTACGCA3TT GATC3TC3 3T A7CGGTGACG TGTCCGGTAG 4 1 4 0 

3CAGCGG037 GTC3GC03GC AGCTG 3 3 CAT 3AGTGA3GG 3 TATCTC3CC3 GA3GC3CTGA 4*00 

3GCCG AT A 33 CACC3 3 3AGG GGCGAGCTTC T3G3CG3CA3 GGC3CC 3AGC GCAGCG 3CG7 4*0 j 

7ACCGCC73C G TO AT AG 37T ATGGTCT3GC A3 3 3 3A3CC 3 C7G 37CCTC3 A3C3C 3 3AGG 4""<20 

A3AGC7CA7T GA733GG3GG GCATG3T3CC C33GCGGAT7 GTAAAACA 3G C3TACJ7GTG l-'dO 

3CGGT GAAAG C 3ACATGACG 3T3C3CT3GT TAA3A3TCA3 AAT33C7333 33AAAATC33 504 ; 

GG GA-A7GrCC T3CT3GTTG3 3T7TA3GGG3 GTT3GA3AA3 GCATTGCC 37 3TTTTA3A3C 5 10 J 

CATCTCC3CC ATGTA3 3 3GA A3rC3G3CT3 T7TTAC3 3 37 AGA73GCG3A 3AT33TG33G :10; 

AATA333A7A T3CAT3GA3A GAGGGGTGAT AGG3GC3ATG G3T7TT7CC3 3CG33TC3A3 5;:/" 

AGTGGACAGT C3GGTGA7AT T TTC 30 37 AT GAG 1TCAGCG ATATC GGC3A ATTTCTCC 3G 028: 

GT7G373AT3 A33TTGTAG3 G3G33ACAT3 73G3AG3A3G AC AG 3 3TT 33 33A3GCCGTC- 534 : 

~ 33CATG773 TACA3GCG33 CCAG3T33T3 7333AT3G33 7GCAC3TA 32 3 3A JGTTGGC :4C0 

GT7A7T 3AAA 3 3 3A7 3CC3 3 C3A3 3A3A3A AG3ATAG3C3 ATGTTTT 3 OC 3 73 3 37 3 3A 3 » 4 6 J 

03 C 33333-33 7 3 337 3 A 3 3 3 G3T7AG33T3 7TT ^'jA-^.-iTA TAG«_-3CTC7A C3333TG3G7 _^^0 

CA333C'"3C A7 , CCC3G73G CCGC0G7CA3 GGC3 3C3 3 3T 7TACC3ATCA 7CA3CAGTGG 5 c 4 0 
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atggttgata gagac3GA3G gcagtttgcg cgagctgacg atcagaaact tcactttggt stoc 

Ti 33GTGTTG G1CAG3AC3C AGTGGCGGGT GAGCTCGCTG GC3GTGCCGG CGGTGGTATT b'/i.ij 

GAG GGCGACG ATAGG333 3A GC3GG7TGGT IAGGG7CTC3 ATTCC 3GCAT ACTGGTAGAG :-o20 

ATC 3CCCT 3A TGGGT3GCGG CGATGCCGAT GCCTT73CC3 CAATC 3TGCG GGCTGCCGCC :e:rO 

GC33AC3GTG ACGATGATG7 CGCAGTGTTC GCGGCGAAAC ACGGC 3AGG3 CGTCGCGCAC C ?'I0 

GTT 3GTGT CT TTCGGGTTCG GC7CGACGCC GTCAAAGATC GCCACCTCGA 'ICCCGGCCTC tC-nQ 

CCG:AGATAA TGCAGGGTTT TGTCCACCGC GCCATCTTTA ATTGCCCGCA GGCCTTTGTC (=C»nO 

GGTGACCAG3 AGGGCTTTTT TCCCGCCCAG CAGCTGGGAG CG7TCGC3GA CTACGGAAAT t.:U0 

G3CGTTGGGG CGAAAAAAGT TAACGTTTGG GAG CAGATAA TCAAA3ATAC GA7AGGTCAT 61*0 

AATATA0C7T C7CG3TTCAG G TTATAATGC GGAAAAA3AA TCCAG^G::GG AC7G3GC7AA 02m 'J 

TAATTGAT3C TGGTGGAG:G TACG3CCGCT AAC3G3GACG GGGC3AATTA :GTG3T3ATT 6300 

AA.AAATAAGT GGCA3GCCGC C3CCAAAAAT AATAATT3GC TGTT 3GT7GG TTAG3T3CA3 ol'-iO 

ACC3TA3AGA 3ATT3T3CTG G 3TG3ACCG3 T3ACGTAATT TGAT 3GGTAG CTTG3TTCAG 6420 

GCTGCAGGGG 3TGGAGGOTT TATTCAGGGA AATATCGCAG C7GGAGACGA AGGGGTGGT: 64*J 

3ATCCGCTGG ATAAGCAGCG TGTTGCCT3C GCG3TCAACT A3GGAAAACA C 3ACCG GC A 3 5 S 1 G 

GTTGAT 37 3A GTG GCT7TTT T7T GGAGGGG CG3GGCCATT TGCTGGGCG 3 CGG 3CA3GG7 ni'.C,^ 

GATTG7C7GA ACT T3T73G3 TGTTGTTGA7 CATTCTCT Z 3 GGIACGAGGA TAA33 37 3GG ot'cO 
GC3AATAGTC AGTAGGGGG 3 3ATAGTAAAA AACTATTA3G A7TCGGTTG3 CTT337TTA3 1 :*n 

TTTTGTCAGC GTTATTTTGT 3GCCCGCCA7 GATTTAGTCA ATA3GGTTAA AATAGCGT CG -5 "7 SO 

GAAAAACGTA AT7AAGGGCG T7T7T7AT7A AT T GAT T TAG A7GA7TGCGG GCGA7GACA7 -^4 0 

TT7TTAT7TT TGGGGCGGGA 3TAAAG7TTG ATAGTGAAAC TGTGGGTAGA TTT3GTGTGG b?'J0 

3AAATTGAAA 3 G AAAT 7 AAA ITTATT7TTT TCACCAGT3G 3T3ATTTAAA 3TTG33CTAT o-'-P 
TGCCGGTAA7 GGC 3GGGCGG 3AACGACGGT GGCCCGGCG: ATTCGC7AGC jTC 7 jGGG AT 

TTCA:3TTT7 GAG 3 GG AT 3 A AC AAT GAAAA GATCAAAACG ATT7GGAGTA GTGGGGGA3G 7-:3D 

3CCC33TCAA I'CAiGACGGG CTGATTGGGG AGTGGCCT GA AG AG G GGC PG AT GGC CAT 3G 7 140 

ACAGGGGCT7 T3A3G3GGT3 7C7TGAGTAA AAGTGGACAA 3GG7GTGATG GTCGAACT33 " 7 3>0 

ACGG3AAACG GC GG GAG C AG : TTGACATGA TCGACGGATT TATCGGCGAT 7ACGGGATCA "'360 



AG 3 3 3ACA3AGCA3 GCAATGC3CC T3GAGGG33T GGAAA7AGCG CGTATG 3TG~ 



4 \ C 



T3GATA7TCA GG'I 3AGCCGG GAGGAGA PGA TTGCCATCAC T ^GCG7 ""A7" ACG~333C3A 

AAv? GG 37CG.- 4 - Goi'jAI co^o ^r.oAi'v-.m. j . ju. ooAjnl <^.~v - - — : j o _no."^'jn 

TGCGTGGCGA GGGGAG7G7G TGGAA^GA'-T G:"CA3GT3AC GAAT373AAA 3ATAA7G 3 33 "-3-r 

TGCAGATTGG GGGT3A3GG3 G3CGA3GG33 GGA73C33GG CT7CTCAGAA GAG3AGAGGA -^6i 



71 



WO 98/21339 PCT/US97/20292 

GCG3CCGCCC CGGCGTGTTG ACGCAG7G:T CGGTGGAAGA GG3CACC3AG CTGGAGCTGG 7 63C 

3CATGCGT3G CTTAACCAGC TAC3CCGA3A CGGTGTCGGT CTACGGCACC GAAGCGGTAT 774C 

TTACC3AC3G 3GATGA7ACG CCG7GG7CAA AG3CG7TCCT CGC:TCG3CC TACG3CTCCC 7 30C 
3 JGGGTT'.'AA AATGC3CTAC ACCTCCGGTA CC3GA7CCGA A3C3CTGAT3 G3CTATTCG 3 

a:-agcaact: gatgctctac ctcgaatcsc gc73CAtctt cattactaaa ggcgxgggg 792c 

T TCA "3GGA-J I GCAAAACG3C GCGGTSAGCT GTATCGGCAT GAC 33GCGCT GTGCCGTCGG 7 9c*0 

CCATTCGGGC GGTGCTGGCG GAAAACCTGA TCGCCTCTAT GCTCGACCTC GAAGTGGC G7 6 3 4 0 

CC3CCAACGA CCAGACTT7C TCCCACTCGG ATATTCGCCG CACCGC3CGC ACCCTGATGC 61 JO 

AGATGCT 3CC GGGZACCGA: TTTAT7TTCT CCGGCTACA3 CGCGGTGCCG AACTACGA3A 6160 

acatgtt :g: cg^ctcgaac TT7GATGCGG AAGATTTTGA TGATTACAAC ATCCTGCA3C 82:: J 

gtga:ct3at gg7t3ac3g z gg3ctgcgtc cggtgaccga ggcggaaacc att3ccattc s290 

gcca3aaag3 ggcgcgggcg atccac-gcgg t7ttccgcga gct3gggctg ccgccaatcg 3340 

ccga3ga3ga gg7g3aggcc gccac2tac3 cgcacg3cag caacgag atg ccgccgcg7a -j 4 co 

acgt3gtg:a ggatctga 3t gcgg7ggaag aga7gatgaa gcgcaacatc accggcctcg 3460 

atattgtcg3 cgcgctgag2 cgcagcggc7 ttgaggatat cgccagcaat attctcaa7a ?bi_0 

tgctg:g7ca gggggtca:: ggcga7TACC tgcagacctc ggccattctc gatcggcagt 8580 

TCGA3GT j 37 GAGTGC3GT 3 AAC3ACA1CA AT G ACT AT CP. GG3GCCGGGC ACCGGCTATC 8G4'J 

G:ATC7C73G CGAACGGTG3 G3G3AGATCA AAAATA7TCC GGSCG7GGTT CA3ZCCGACA 8730 

c2attgaata aggcggtatt cctgtgcaac agacaacc3a aattcagccc tc7rt7a3cc 87.50 

t 3aaaacccg cgagggcggg 3tagct7ctg cc3atgaacg cgccga13aa gtggtgatcg 8320 

gcgtcgg::: tgccttcgat aaacaccagc atcacactct gatcgatatg ccccatggcg seso 

cgatcctcaa agag:tgatt gccggggtgg aagaagaggg gcttcacgcc cgggtgg7gc 8 94C 

GCATTC7 j3G CACGTCCGAC GTCTCCTTVA TGGCGTGGGA T3CG3CCAA Z C TGAG' - :GG=. :Y 900C 

C3GGGAt:g3 :at:ggtatc cag7:gaag3 ggacgacggt oatczatcao cs:gatc7GC 90?c 

TGCCGCT:AG CAACCTGGAG CT3TTCTCCC AG3C3CCGCT GCTGAOGCT 3 GAGACCTACC 01 Of 

3GCAGATT3G CAAAAACGCT GC 3 2 3CTATG :3 2GCAAAGA G7CA:CTTC3 CCGG7 3CC33 ? 1 : : 0 

TGG7GAACGA 7CAGATGG7G CGGC^GAAAT TTATGGCCAA AGCCGC3CTA T7TCATATCA 02 4 0 

AAGAGACZAA ACAT37GGTG ZAGGA3GCC3 A322CGTCAC CCTGCACAT 2 3ACT7A3TAA 9300 

GGGAG~G-CC ATCJAGGGAGA AAA-2 3ATGZ3 C3T3CAGGA7 TATCCG7TA3 CZAC™CTC 9360 

CO CG GAG OAT ATCCTGAC3C CTAC3GGCAA A3CATT3ACC 3 AT ATT AC CC 7 2GA3AAG3T -423 

a^ctotooo o-agg^ggc:: cgoazgatz7 ::goat:tcc CGCCAGACCTC 7TGA:TACCA 94 S 0 

GGCG3AGATT GCCGA3CA3A T37A3C37GA TGCGGTGGZG C32AATTTCC G2CGC3CGGC 9b4 0 

3 GAG 77 7 AT 2 '1 7CA"7 2273 AC3AGC33.A7 7CTGG37A7C 7ATAACG3GZ 7333237377 -62: 
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CCGCTCCTC3 CAG3CGGAGC TGCTG3CGAT CGCCGACGAG CTGGAGCACA CCTGGGATGC 9 € 6 C 

GACAGT3AAT GC'l'GCCTTTG ?:CGGGAGTC 3GCGGAAGTG TATCAGCA3C GGCATAAGCT 9710 

G:GTAAAGGA AGC7AAGCGG AGG7CAGCAT GCCGTTAATA GCCGGGAT7G ATATCGGCAA 97 60 

C jCCACCACC GAGGTGGCGC TGGCGTCC3A CTACCCGCAG GCGAG3GCGT 7TGTTGCCAG 9 a 4 0 

C 3 33ATCGTC GC 3ACGAC33 G2A7GAAA3G GAC3:3G3A: AATATCGCCG GGACCCTCGC 99CC 

I'GCGCTGGAb GAGGC7CT3G CGAAAACA3C GT33T7GAT3 AGGGATGTCT CTCGCATGTA 99tX 

TCTTAACGAA 3CCGCGCCG 3 7GATTGGCGA TGTGGCGATG GAGACCATCA CC GAG AC CAT I002C 

TAl'CACCGAA TCGAC^ATGA TCGGTCATAA CC^GCAGACG CCGGGCGGGG TGG33GTTGG lOObt" 

CGTGG3GACG ACTAT3GCCC TC3G3CG3CT 3 3CGACGC7G CCGGCGGCGC AGTATGCCGA 1CM0 

3GGGT3GATC GTACTGAT7G ACGACGCCGT C 3AT77CC7T GACGCTGTGT GG7G3CTCAA 10200 

TGAGGCGCTC GACCGGGGGA TCAA3GT3GT G3CGGCGATC CTCAAAAAGG ACGACGGCGT 1C260 

GGTGGTGAA" AAGCG7CTGC GTAAAACCCT G 3CGGTGGTG GATGAAGTGA CGCTGCTGGA 1C32Q 

3-:AG3T:CC: GAGG3GGTAA TGGCGGC3GT G3AASTG3CC GCGCCG3GCC A3GTGGTGCG 10360 

3ATCCTGTCG AA7CGCTACG GGATCG3CAC C7TC7TCGGG C7AAGCCCG3 AAGAGACCCA 10440 

GGCCA7CG7C CCCATCGCC3 gcgccctgat TGGCAACCGT TCCGCGGTGG TGCTCAAGAC 1050:3 

zgzgcagggg gatgtgcagt cgcgggtgat ccgggcgggg aacc7ctaca ttagcggcga 105*3 

aaagcgcc 3 2 ggagaggcc3 atgtcgccga gggcgcg3aa gccat-7atgc aggcgat 3 ag 106 j j 

cgcctg:3:t zczgz;kcgcg acat:cgcgg c3/\accgggc accca:gccg gzggcatgct 10610 

tgagcgggtg c3-3aaggtaa tggcgtccc7 '3accgg7ca7 gagatgagcg cgatatacat 1j/40 

CCAGGATCTG CTGGCG3TGG ATACGTTTAT TC:GCGCAAG GTGZhGGGZG 3GATGGCCGG 108 DO 

CGAGTGCGC:: ATG3AGAATG CCGTCGGGAT GG7GGC 3ATG 3TGAAAGC3-- ATCGTCTGCA 10860 

AATGCAGGTT A7CGCCCGCG AACTGAGCGC CCGACT 3CAG ACCGA 3GT 3 3 7GG7GGGCGG 10920 

cgtggaggcj aacatggcca tcgccggg3c gttaaccact cccgg7tgtg cggcgccgct 10930 

ggcgat::tc gacctcggcg ccgg:tcga: ggatgcggcg atcgt:aacg cggaggggca i i c 4 o 

GATAA J 3GC 3 GTCCATCTCG CGGGGGCGGG GAATATGGTC AG7CT3TTGA TTAAAACCGA 111)0 

GCTGGGTTTT 3A3GATCTTT C 3CTGGC3GA AGCGATAAAA AAATA3 2C3C TG3CCAAAGT 111 tO 

G3AAAG:CTG TTCAGTATTC GTCACGAGAA TG3CGCGG7C GA3TTCTTTC GGGAAGCCCT 11220 

CAGCCC3GC3 GT37T3G3CA AAGTG3T3TA 7A7CAAG3A3 G GCG AACTGG TGCCGATCGA 11230 

TAACGCCAG: CC3CTGGAAA AAATTCG7CT :'37GC3CC3G CA3GCCAAAG AGAAAGTGTT 11342 

TGTCAC 2AA2 7GCCTGC3CG CGCT3CGCCA G3TCTCACCC GGCG3TTCCA TTCGCGA7AT 114 01 

:;r-CGTTT,;7 - GTGC PGGTGG G^GGCTTGATC 72TG3AC7TT GAGATCCCGG A3CTTATCAG Ll4c0 

3GAAGCC77G TC 3 C AC TAT 3 G3373G7CGC 73G3CAGGG7 AATA77CJGG 3AACA 3AA3 3 11d2.j 

GCTGTlirAAT C-CGGTGGCCA "CGGGC7GC7 ACT3G73GGT CAGGCGAAT7 AAACG33CGC 11561 
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tcgcgccagc gtctctcttt aacgt3ctat tt gag gat 3 c c3ataatgaa gcaga'gttct 11 640 
agcttaagcg ggcagtgcgt ggccgagttt cttggcacc: gattgctcat tttcttcggc iitog 

G 7GGGCTGCG TCGCTGCG:T GCGGGTCG3C GG3GCCAGCT TTGGTCAGTG GGAGA7CAGT ! 17 6C 

attatct jGG gccttggcgt cgccatgg : : at:tacctga cgg:cggtgt ctucggcgcg :152c 

CACCTAAATC CGG23GT3AC CATTGC2CT3 TG3CT3TTC3 c:?3ttttga ACGCCGCAAC il^C 

g:gctg:cg? ttattgttgc ccagacgggc ggggcgttct g-:g:cgccgc gctggtgtat 1194c 

GGGCTC7ATC 3CCA3CTG FT 'i'CTCGATCTT GAACAGAGTC AGCATATCGT GCGCGCCACT 120OC 

GCCGCCAGTC TTAACCTGGC CGGGGTCTTT TC3AC3TAC Z CGCATGCACA TATCACTTTT 12C6C 

ATACAA3CG? TTGCC3TGGA GACCACCATC ACGGCAATCI TGATGGCGAT GATCATGGCC 1212C 

CTGACC3ACG ACGG3AACGG AATTC 1214 5 

[?.) 7NFOR.MATION FOR SEQ ID NJ:20: 

(i! SEQUENCE CHARACTERISTICS: 

:A) LENGTH: H r.aso pairs 
:&) TYPE: nucleic acid 
;C) STRANDEDNESS: single 
CO TOPOLOGY. l.neai 

KOLECULE TYPE: UNA (genome) 

(xi) SEQUENCE DESCRIPTION: SEC- ID NO: 20: 

AG CT TAG GAG T 7TAGAATAT TGACOTCGAA TTCXGGGCA TGCGGCA.CCG GATCCAGAAA 60 

AAAGCCCGCA CCTGACAGTG CG3GCTTT7T TTTT \<4 

(2) IN FORMAT I OK FOR SEQ ID WO : 2 1 : 

(i; SEQUENCE CHARACTERISTICS : 

■ A) LENGTH: 37 pairs 
:3) TYPE: nucleic acid 
iH STRANDEDNESS: .single 
,0) 'I'JPOLOGY- linear 

(ii; MOLECULE TYFE: DNA (genomic) 

i x l : SEQUENCE OESCRI PC I ON : SEQ ID NO: 21: 

GGAATTCAGA TCTCAGCAAT GAGCGAGAAA AC 3 AT GC 3^ 

(2) INFORMATION L-VK SEQ ID NO: 22: 

(i! SEQUENCE CHARA3TER1STI CS : 

:A) LENGTH: 27 ra~e pairs 
20 TYFE: nurle.c acid 
■C) STRANDEDNESS: single 
CO TOFCLDGY: linear 

; i ; ;■ ::OI.E3t)2F TYFE : : NA ■.genomic) 

xi SEQUENCE jESCF.I FT ION : 3 EC 12 NO:22; 

23TCTAGATT A32TTCCTTT AC3CAG3 ^ 

2 ' INFORMATION FUR SEl 12 NC:23: 
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(2) INFORMATION FOR SEQ IT; NO: I: 

(.) SEQUENCE CHARACTERISTICS: 

(A) 1EKGTK: 16 66 case pairs 
(R; TYFE: nucleic acic 
(C: STRANDEDNESS : single 
(D) TOPOL03Y: linear 

(li) MOLECCLE TYPE.: DMA igencmic; 

liii) HYPOTHETICAL: NO 

(iv) ANTI-53NSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHABI 

(xi) SEQUENCE DESCRIPTION: SEQ ID NC : 1 : 

ATGAAAAGAT CAAAACGATT TGCAGTACTG GCC3AGCGCC CCG7CAATCA GGACGGGCTG 60 

ATTGGCGAGT GGCCTGAAGA GGGGCTGATC GCCATGGACA GCCCCTTT3A CCCGGTCTCT 12 C 

TCAGTAAAAG TGGACAACGG TCTGATCGTC GAA3TGGACG SCAAACGCCG GGACCAGTTT ISC 

GACATCATCG AGCGATTTAT CGCCGATTAC GOGATCAACG TTGAGGGCAC AGAGCAGGCA 24 C 

ATGCGCC7G3 AGGCGGTGGA AATAGCCCGT ATGCTGGTGG ATATTGACGT CAGCC 3GGAG 30C 

GAG AT -AT 7 3 C2ATCACTAG CGCCATCACC GCGGCCAAAG CGGTCGAGGT GATGG 3GCAG 360 

ATGAACGTG j T 3GAGATGAT GATGGCGCTG CAGAAGATGC GTGCCCGCCG GACCC 3CTCC '120 

aaccagtg:: a:gtcacgaa tctcaaagat aatccggt agattgccgc tgaggtcgcg ;&o 

gagggiggga tgggcggitt ctcagaacag 3AGAc:ac3G tcggtatcg: GCGCTACGCG 54 

CCGTTTAAG3 CIGTGGGGIT GTTGGTCGGT TC3CAGTG3G GCGGGCCCCG CGTGTTGACG 600 

CAGTGCTC'3 3 T 3GAAGAGGC CAC 3GAGCTG GA3CTGGG 3A TG3GTGGCTT AACCA3CTAC 66 

GCCGAGA333 T3TCGGTCTA CG33ACCGAA GC3GTATTTA CC3ACGGC0A T GAT A 3GCCG "2 
TG 3T IAAAG 3 CG ITG 3T I G3 TTCGGCG TAG GG 3TCICG3G GG 21 JAAAAT GC3 3TACAC3 

TC3GGCAI33 GATGG 3AAGG GCTGATGGGG TATTC3GAGA GCAA 3TGGAT G3T ITACCTC 8- 

GAATIGG^II GGATC P7CAT FACOAAAGGC G J JGGGGT 2C AGGGAGT 3CA AAACGGGGI3 ?C3 

GTGAG3T jrTA T G 3 GG AT 3 AG IGGC33TGTG GG 3TCGGGCA T TCG 3-^G 3GT j GT ; j GG G 3AA ? 6 J 

AA3CTGAT33 3CTCTATG 31 GGA3 3 TGGAA IT3GC3T3CG CGAA2GA0CA GACTTT JTC 3 1020 

3ACTGGGATA TTCGC3GTAC GGG3CGCACG GT3ATGCA3A TGCTGCCGG3 GA3CGAGTTT 108 0 

ATTTTCTC33 G:TA3AGC3C GGTGCGG.AAG TACGA3AACA TGTT3G3CG3 CTCGAACTTC 1140 

3ATG 3 GG AA 3 ATTTTGAT3A TTA IAA 3ATC IT 3CAGCGT G AGGT3AT3GT TGA3G33G3C 1200 

3TG2GT3C33 T3ACCGA3GG GGAAAC 3 ATT IGGATTGGG3 AGAAAGGIGG 3GG 3GGGAT 3 1250 

GA3GGG-TTT TGG^CGA.GCT _-^3j3TjCG3 CCAA.Cj_Co nGj. u . jGn j j. j^A.3333G3C _j*jI 

AGGTAG3C3G A G G G G A G G AA GGAGAT3GGG CG 3CGTAACG TGGT3GAG3A TCT GA3TG 2G "3-0 



. J « S sJ.'V JM . OM . 'J! v. .-vrtv 
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A3CGGCTTTG AGGATATCGC CAGCAATATT CTCAATATGC TGCGCCAGCG 

GATTACCTGC AGACCTCG 32 CATTCTCGAT :GGCAG7TCG AGGTGGTGAG 

CACATCAATG ACTATCAGGG GCCG33CACO GGCTATCGCA TCTCTGCC3A 

GAGATCAAAA ATATTC2G3G CGTG GTTCAG CCCGACACCA TTGAATAA 

;2' INFORMATION FOF< CEQ I i) N 0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 base pairs 
(3) TYPE: nucleic acid 

IC) STRANDEDNESS: single 

ID) TOPOLOGY: linear 

MOLECULE TYFE: CNA (genomic) 

;vii ORIGINAL SOt-rCE: 

i A) ORGANISM: CHAB2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

GTGCAACAGA CAACCCAAAT TCAGCCCTCT TTTACCCTGA AAACCCGCGA 

GCTTCTGCCG ATGAACG 3GC CGATGAA3TG GTGATCGGCG TCGGCCCTGC 

CACCAGCATC ACACTCT GAT CGATATG ZCZ CATGGCGCGA TCCTCAAAGA 

3GGGTGGAAG AAGAGGG3CT TCACGCCCCG GTGGTGCGCA ttctgcgcac 

TCCTTTA.T jG CCTGGGATGC GGCCAACCTG A3CSGCTCGG GGATCGGCAT 

tcgaaggg jA ccac"gtcat ccatcag xc :,a?ctgctgc cgctcagcaa 
rtctcccagg cgcc 3ctgct 3acgctg3a3 acctacc3gc agattggcaa 
2gctatgcg2 gcaaagagtc accttcgcc3 gtgccggtgg tgaacgatca 
:cgaaatt:a tggccaaagc cgcgctattt catatcaaag agaccaaaca 
3acgccga3c 2cgt2accct gcacatc3ac ttagtaa3gg agtga 
(2) information for seq zd no : 3 : 

li) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 426 base pairs 
(B: TYPE: nucleic acic 
(C; S7RANDEDNE3S: sincle 
lC) TOPOLOGY: linear 

(ii) MOLECULE TYFE: DNA genomic) 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: C H AE 3 

ixi; SEQUENCE EESCR 1 l-TION : .'iEj I N N(. : 3 : 

~"""TGCG CGTGCAGGAT CATCCGTTAo CCACCCGCTG 

ACCGGCAA AC CAT T GAL C J AT AT TAG CO 7CGA-AAGGT 

CAGGATGT GCGGATCTCC CGCCA3ACCC TT^A-jTACCA 



GGTCACCGGC 
7GCGGTCAAC 

ACGCTGGGCG 



GGGCGGGGTA 
OTTCGATAAA 
GCTGATTGCC 
GTCCGACGTC 
CGGTATCCA3 
CCTG3AGCT 3 
AAAC3CTGC Z 
GATGGTGCG3 
TGTGGTGCAG 



:CCGGAGCAT 



1SC0 
156C 
1G20 



60 
120 
130 
24 0 
3 00 
3 60 
420 
4*0 
540 
535 
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GCCATTCCT3 ACGAGC3CAT TCT3GCTATC TATAACGCGC TGC3CCCGTT CCGCTCCTCG 300 
CAGGCGGAGC TGCTGGCGAT CGCGGACGAG CTGGAGCACA CCTGGCATGC GACAGTGAAT 3e"0 
GCCGCCTTTG TCCGGGAGTC GGCGGAAGTG TATCAGCAGC 3GCATAAGCT GCGTAAAGGA 4 20 



INFORMATION FOR SEQ 10 NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

!A) LENGTH : 1164 base pairs 
[ D ) TYPE: nucleic aciri 
(C) STRANDEDNESS : single 
(L) TOPOLOGY: linear 

{ill MOLECULE TYPE: r NA (genomic) 

Ivi] ORIGINAL SOURCE: 

(A) ORGANISM: DHAT 

;xi) SEQUENCE DESCRIPTION: 3£Q 10 NO: 4: 



ATGAGCTATC GTATGTTTGA TTATCTGGTG OCAAACGTTA ACTTTTTTGG CCCCAACGCC 60 

ATTTCCGTAG 7CGGCGAACG CTGCCAGCTG GTGG3GGGGA AAAAAGCCCT GCTGGTCACC 120 

GACAAAGGCC TGCCGGCAAT TAAAGATGGC GCGGTGGACA AAAC3CT3CA TTATCTG CGG 120 

GAGGCCGGGA TC3AGGT3GC GATCTTTGAC GSCGTCGAGC CGAA3CCGAA AGACACCAAO 2-10 

GTGCGCGACG GCCTCGCCGT GTTTCGCCGC GAACAGTGCG ACATCATCGT CACCGTGGGC 3 30 

GGCGGCA3C0 OGCAOGATTG C3GCAAAGGC ATCGGCATCG CCCCCACCCA TGAGGGCGAT 3n0 

CTGTACCAGT ATGCCGGAAT C3AGACCCT3 ACCAAC GCGC TGCCGC CTAT CGTCG Z 3GTC 4-0 

AA7ACCACCG CCGG^ACCGC CAGCGAGGTC ACCCGC-ACT GCGTCCTGAC 3AACACCGAA 4 80 

AC3AAA3TGA A3TTTGTGAT :GTCA3CTG3 CG3AAA:TG: CGTC3GTCTC rATCAA33A7 34 C 

CCACTGCTGA TGATCGGTAA ACCGG3CGCC CT-3AC3 3CG3 CGACCGGGAT 3GATC OGCTG 60C 

ACCCA:GCC3 TAGAGGCCTA TAT CTCCAAA 3AC 3CTAACC C3GTGAC 3GA :3C3GC:GCC 66C 

ATGCAG 3CGA TCCOCCTCAT ZGCZCGC?J\Z CTGCGCCAGG CCGTGGCCCT :CG3AG:?JVI "2C 

CTGCAG 3CGC GGGAAAACAT 3GCCTATGCT 7CTCTGCT3G 3CGGGATGGC TTTCAA7AAC /60 

GCCAACGTZG GCTACGTGCA CCCCATGGC3 :ACCAGCT3G 3CGGCCTGTA CGACATGCCG 640 

cac3gc3tgg ccaacgctgt cct-3ctgcc j catgt3gcgc 3ctacaacct gatcgtcaac 900 

c:c3agaaat tcgccgatat cgctcaac7g atg3gc3aaa atat caccgg act 3t go act 960 

:tc3a:gc3G cggaaaaagc catjgccgct atop.cg:g~c igtcgatoga tatugtatt :o.-' 

"ggtagcatc 7gcgcgatct gg3ggtaaaa 3 ago ccg act tcccctacat ggc 33a3atg 1080 

3CTCTAAAAG AC3GCAATGC GT7CT0GAAC C 7GC37AAAG G0AA3GAGCA G3AGA7T3CC 114 0 

3CGATTTT0C GCCA33CATT CTGA 1^64 



AGCTAA 
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INFORMATION FOP SEQ ID NC : 5 : 

[ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1360 base pairs 
(5) TYPE: nuc.eic acid 
(C) STRAND EDNESS : single 
CO TOPOLOGY: linear 

(li; MOLECULE TYPE : DMA (qenomic) 

(vi! Or IGINAL SOURCE: 

(A) ORGANISM: GPD1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



CTTTAATT7T CTTTTATC7T ACTCTCCTAC ATAAGACATC AAGAAACAAT T3TATAT7GT 60 

ACACCCCCCC CCTCCACAAA CACAAATATT GATAATATAA AGATGTCTGC TGCTGCTGAT 120 

AGATTAAACT TAAC77CCGG CCAC7TCAAT GCTGGTAGAA A3AGAAG7TC CTCTTCTGTT 1*0 

TCTTTGAAGG CTGCCGAAAA GCCT7TCAAG GTTACTG7GA T7GGATC7GG TAAC7GGGGT 2^0 

ACTAC7AT7G CCAAGGT3GT TGCCGAAAAT T3TAAGGGAT ACCCAGAAGT T7TCGCTCCA J.:Q 

AT AG T AC AAA TG7GGGT 3TT CGAAGAAGAG A7CAATGGTG AAAAATTGAO TGAAATCATA 2 60 

AATAC7AGAC ATCAAAACGT GAAA TAC7TG CCTGGCA7CA C7CTACCCGA CAATTTGGT7 410 

GCTAA7CGAG ACT T GAT l'GA TTCAGTCAA3 GATG7C 3ACA TCATCGTTT7 CAACATTCCA 4 SO 

CATCAATTTT TGCC 3CGTAT CTGTAGCCAA T7GAAAGGTC A7GT7GATTC ACACGTCAGA c :-0 

GCTATCTCGT GTCTAAA 3GG TT7T jAAGT i GGTGCTAAAG G7GTCCAAT7 GCTA7CCTC7 i:00 

TACATCACTG AGGAACTAGG TA7T 3AA7GT GGTGC TOT AT CT 3GTGCTAA CATTCCIACC -60 

GAAGTCGCTC AAGAACAGTG GTCT iAAACA ACAG7TGCTT ACCACATTCC AAA3GATTTC ''20 

AGAGG3GAGG GCAA3GACGT CGAC3ATAAG GTTOTAAAGG CDTTGTTCCA CAGACCTTAC ^80 

ttc:acjTta gtgtsatcga agatsttgct g jTatc tcca mtgtggtgc r itgaa 3aag c40 

Z-T7 3TTGCCT TAGG7TGTGG TTTCGTCGAA GGTITA3GCT 3 3GGTAA-AA IGCTTCTGCT 900 

7CCATCDAAA GAGTOGGTTT GGGTGAGATC ATCAGAPTOG 3TCAAA7GTT PTTCCCAGAA i60 

TC7AGAGAAG AAACATACTA CCAAGAGTC7 GOT 3GT 3TTG 0TGATT7 3 AT :ACCA00TG: 100C 

3C7GGTGGTA GAAACGTCAA GGTTGCTAGG 0TAATG3CTA 7TTCT3GTAA SGACGCCT'JG 108C 

GAAT3TGAAA AGGAGTTGTT GAATGGCCAA TCCGCTCAA3 3TT7AA7TAC 3TGCAAA3AA 1 14 C 

3TTCACGAAT GGTTGGAAAC A7GTGGCTCT GTCGAAGAGT 7CCCATTATT 7GAAGCCGTA 120C 

TAC CAAATCG T7TACAACAA CTACTCAATG AAGAACCT3C C33ACATGAT 7GAAGAAT7A 1 260 



3ATC7A0AI0 AAGATTAGA7 T7A77 3GAGA AA3ATAACAT AT OAT 



000CA0TT7T 



TTCGAG3CTC T7CTATA7CA TAT 7 CAT AAA 7 TAG CAT TAT 



GT7AT7TC7C 



A T AA C T A C 7 T 




SEQUENCE C r i A R AC T E R 1 5: 7 T C S : 
;A 1 LENGTH: 2946 base pairs 
!E) TYPE : t-.ucleic acia 
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(C; STRANDEDNESS : single 
(D; TOPOLOGY: linear 



MOLECULE TYPE ' CNA (genoir.ic) 



( v: ) 



ORIGINAL SCURCE: 
( A ■ ORGANISM: 



GP02 



(xi) 



SEQUENCE DESCRIPTION: 3EQ 1 0 NO: 6: 



3AATTCGAGC CTGAAGTGCT GA7TACC7TC A3GTAGACTT CA7CTTGACC CATCAACCCC f/0 

AGCGTCAATC CTGCAAATAC ACCACCCAGC AGCACTAGGA T GAT AG AG AT AATATAGTAC 120 

GT3GTAACGC TTGCCTCATC ACCTACGCTA TGGCCCGAAT C3GCAACATC CCTAGAATTG lUC 

AGTACGTGTG ATCCGGATAA CAACGGCAGT GAATATATCT T:GGTATCGT AAAGATGTGA 2 -i C 
TATAAGATGA TGTA7ACCCA A? G AG GAG C 3 CCTGATCGTG ACCTAGACCT TAGTGGCAAA 

AACGACA7AT CTATTATAGT GGGGAGAGTT TCGTGCAAAT AAOAGACGCA GCAGCAAGTA 36C 

ACTGTGACGA TATCAAC.TC7 TTTTTTATTA TGTAATAAGC AAACAAGCAG GAATGGGGAA 4 2£ 

AGC77ATGTG CAATCACCAA GGTCGTCCC7 TTTTTCCOAT TTGGTAATT7 AGAATT7AAA 44 C- 

GAAACCAAAA GAA7GAAGAA AG AAA AC AAA TAC7AGCCC7 AACCCTGACT TCGTTTCTAT 5-10 

GATAATACCC TGCTTTAATG AACGGTATGC CCTAGGGTA7 ATC7CACTC7 3TACGT7ACA bOU 

AA0TCCGGT7 ATT7TATCGG AACATCCGAG CACCCGCGCC TTCCTCAACC CAG3CACCSC 660 

rCCAGG7AAC 7GTGCGCGAT GAGCTAATCC 7GAGCCATCA CCCACCCCAC CCG7T GATGA 7 2 0 

-AGCAA7TCG GGAGGGCGAA AATAAAAC7G OA 3CAAG GAA 7 TAG CATC AC CGTCA7CA7C 7-6 0 

ACCATCATAT CGCCT7AGCC CCTAGCCATA GC:AT:ATGC AA3CGTGTAT CTTCTAAGAT 84 3 

TCAGTCATCA 7 : AT T AC CG A GTTTG7TT7C C7TCA3ATGA 7GAAGAAGGT 7TGAGTATGC 90 0 

TCGAAACAAT AAGA.CGAGGA TGGCTCTGCC AT TGG7TA7A TTAC GCTTTT CCGC-03AGGT "**0 

G ICGATGGGT T3CTGAG3GG AAGAGTGTTT AGCTTACGGA CC7AT7GCCA TTGTTATTCC K'20 

3AT7AAT CTA TTGTTCA3CA GCTCTTC"CT ACGCTGTCAT TC7AG7ATTT <tTTTTTTTTT 108 0 

TTT7TGGT7T TACTTTT7TT TCTTCTTGCC TTTTTTTCTT GT7ACTTTTT TTCTAGTTTT 114 0 

7TTTCCTTCC ACTAAGC7TT TTCC7TGATT TATCCTTGCG TTCTTCT7TC TACTCCT7TA '23 J 

GATTTTTTTT TTATATATTA ATTT7T.AAGT T7ATGTA7TT TGGTAGATTC AATTCTC7TT 1260 

CCCTT7CCTT TTCC7T7GC7 CCCC77C:TT A7CAATCCTT GCTG7CACAA GATTAACAAG 1320 

AT AC AC ATT 3 CTTAAGCGAA CGCATCC CGT GTTATATACT CGTCGTGCAi' A7AAAAITT T IjtJO 

GCC7TCAAGA TCTACTTTCC TAAGAAGA7C ATTATTACAA A7ACAACTGC ACTCAAAGA7 I -M C 
GAC7GC7CAT ACTAATATGA AAC AG OA _\AA AC ACT 17CA7 G AG G AC CATC CTATCAGAAG 

A7CGCACTC7 3CCCT37CAA 7TGTACATTT GAAACG7GCG CCC7TCAAGG FTACAGTGAC 1-6C 
TGGTCCI'GGC AAJCG3GGGA CCACCATZGC CAAACTCA7T GC3GAAAACA CAGAA.T7GCA 

TTGCCATATC 77 7GAGCCAG AGGTGAGAAT GTGGGTTTTT GATGAAAAGA TCGGC3AC3A lOfcj 
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aaatct sac 3 gatat 3ataa atacaagaca ccagaac3tt aaatatctac c3aatattca ".74 *■ 

ictgccicat aatctagtgg ccgatgctga tc7tttacac tccatcaagg g7gc7gaca7 icuo 

j :ttg7 . r: . :: pa.2w2ccvc atcaattttt accaaacata gtcaaacaat t 3 3aaggcca 25 60 

:g-tggc:c:t catgtaaggg ccatctcgtg tctaaaaggg ttcgagttgg gticcaaggg :92v 

tgtgcaatt.- ' .tatc :tc :t atgttactga tgagttagga atccaatgtg G23Cactatc 19r?: 

tggtgcaaa: ttggcaccgg aagtggccaa ggaggattgg tccgaaacca cjgtggctta 2C<;: 
icaactacca aaggactatc aaggtgatgg caaggatgta gatcataaga ttttgaaatt 

3:tgtt:ga: agaccttact tccacgtcaa tgtcatcgat gatgttgct 3 gtatatccat :idj 

tgccgg rgcg t 3aa3aacg tc3tggcact tgcatgtggt ttggtagaa3 gtatgcgatg 2220 

33gtaacaat gc2tccgcag ccattcaaag gctgggttta ggtgaaatta tcaagttcgg 2230 

7agaa7gt7t ttcccagaat ccaaagtcga gacgtactat caagaatccg ctggtgttcc 2 2 40 

agatc7ga7g ac gacctgct caggcggtag aaacgtcaag gttgccacat acatggccaa l4 30 

gaocggtaag ccagccttgg aagcagaaaa g3aattgctt aacggtcaat c 3gcccaagg 2 4 60 

3ataa7caca 7g3agagaag ttcacgagtg g3tacaaaca t3tgagttga cccaagaat7 2 5 20 

c 3caa7tatt ogag 3cag7c taccagatag i' 2tacaacaa cgtccgcatg 3aagacctac 25-30 

■2 3ga3atgat 7gaagagc7a ga2atcgatg a3gaatagac actctcccc3 2c:ctccccc 254 ) 

tctgat3ttt :ctgt7gcct 2tttttcccc "aaccaa7tt a7ca7tata 3 a2aag i' i'cta 27)0 

3aactactac ta3taacatt act ac ag tt a ttataatttt 3tattctctt t7tctttaa3 2760 

aatctatcat taac3ttaat ttctatatat a 3ataac7ac cattataca 2 3ctattatc3 2b 20 

tttacatatc acatcaccgt taatgaaaga tac3acaccc tgtacactaa cacaattaaa 28-30 

7AATCG2CAT AA3CT7TTCT GTTATCTATA G3CCTTAAAG C7GTTTCTT0 3AGCT Fi'TCA 2?4 0 

CT3CAG 2 94 6 
(2) INF0RKA7I0N FOR SEQ ID NO: 7: 



(i) SEQUENCE CHARACTERISTICS: 

;Ai LENGTH: 31^8 base pairs 
(Bj TYPE: nucleic acid* 

. S7FAN2EDNESS : single 
ID) TOPOLOGY : linear 

(i.) MOLECULE TYPE: DNA (genomic; 

(v. ; ORIGINAL SOURCE: 

(A) C RGAKISM : GUT 2 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 



UToCAGAACT 



TGT3CCCATC CTC3CGGTTA 3AAAGAAGCT GAATT3TTT2 




CAATA AT C ACT GC A3 TAATTCCTTT TTAGCAACAC 



ATAC7TATAT 
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TAAAC7TAAA 


agagaacagc 


CACAAATAGG 


GAACTTTGGT 


CTAAACGAAG 


GACTCTCCCT 


30C 


CCCTTATC7T 


gaccgtgcta 


TTGCCATCAC 


TGCTACAAGA 


CTAAATACGT 


ATTAATATAT 


3 60 


GTTTTCG37A 


AC G AGAAGAA 


GAGCTGCCGG 


TGCAGCTGCT 


GCCATGGCCA 


CAGCCACGGG 


420 


GAL'GC'i'GTAC 


TGGATGACTA 


GCCAAGGCGA 


TAGGCCGTTA 


GTGCACAATG 


ACCCGAGCTA 


4 3C 


CATCG7GCAA 


TTCCCCACCG 


CCGCTCCACC 


GGCAGGTCTC 


TAGACGAGAC 


CTGCTGGACC 


54 C 


GTCTGGACAA 


GACGCATCAA 


TTCGACGTGT 


TGATCATCGG 


TGGCGGGGCC 


ACGGGGACAG 


60 C 


GATGTGCCCT 


AGATGCTGCG 


ACGAGGGGAC 


TCAATGTGGC 


CCTTGTTGAA 


AAGGGG3AT7 


660 


TTGCCTCGGG 


AACGTCGTCC 


AAATCTAC CA 


AGATGATTCA 


CGGTGGGGTG 


CGGTACTTAG 


72C 


AGAAGGCCTT 


CTGGGAGTTC 


TCCAAGGCAC 


AACTGGATCT 


GGTCATGGAG 


GCACTCAACG 


780 


AGCGTAAACA 


TCTTATCAAC 


ACTGCTCCTC 


ACCTGTGCAC 


GGTGCTACCA 


ATTCTGATCC 


34 0 


C CATC TAG AG 


C ACCTGGCAG 


GTC2C3TACA 


TCTATATGGG 


2TGTAAATTC 


TACGATTTCT 


900 


T7GGCGG7TC 


CTAAAACTTG 


AAAAAAT CAT 


ACCTACTGTC 


CAAAT CCGCC 


ACCGTGGAGA 


MbO 


AGGCTCCCAT 


GCTTACCACA 


GACAATTTAA 


AGGCCTCGCT 


TGTGTACCAT 


GATGGGTCCT 


1020 


TTAA-GACTC 


GCGTTTGAAC 


GCCACTTT AG 


GCATCACGGG 


TGTGGAGAAC 


GGCGCTACGG 


lOfiO 


TCTTGATCTA 


TGTCGAGGTA 


CAAAAATTGA 


TCAAAGACCC 


AACTTCTGGT 


AAG3TTATCG 


1140 


gtgc:gaggc 


CCGGGACGTT 


GAGACTAATG 


AGCTTGTCAG 


AATCAACGCT 


AAA7GTGTGG 


1200 


tcaatgc:ac 


G3GCCCATAC 


AGTGACGC2A 


TTTTGCAAAT 


GGACCGCAAC 


2CATCCGGCC 


1260 


TjCCu'jA'-T^ 


CCCGCTAAAC 


GACAACTC2A 


AGATCAAGTC 


GACTTTCAAT 


:aaatctocg 


1.32 0 


tcatggaccc 


GAAAATGGTC 


ATCCCATCTA 


TTGGCGTTCA 


CATCGTATTG 


CCCTCTTT7T 


1330 


ACTCC CCG AA 


GGATATGGGT 


TTGTTGGAGG 


TCAGAACCTC 


TGATGGCAGA 


GTGATGTTCT 


U40 


TTTTACCTTG 


GCAGGGCAAA 


GTCCTTGCCG 


GCACCACAGA 


CATCCGACTA 


AAGCAAGTCC 


150 0 


CAGAAAACCC 


TATGCGTAGA 


GAGGCTGATA 


TTCAAGATAT 


CTTGAAAGAA 


C T.AC AG C ACT 


15*3 


ATATCGAATT 


CCCCGTGAAA 


AGAGAAGACG 


TGCTAAGTGC 


ATGGGCTGGT 


GTCAGACCTT 


1620 


TGGTCAGAGA 


TCCACGTACA 


ATCCCCGCAG 


ACGGGAAGAA 


ggg:tctgcc 


ACTCAGGGC 3 


16 5 0 


TGGTAAGATC 


CCACTTCTTG 


TTCACTTCGG 


ATAATGGCCT 


AATTACTATT 


GCAGG7GGTA 


P40 


AATGGACTAC 


TTACAGACAA 


ATGGCTGAGG 


AAACAGTC 3A 


CAAAGTTGTC 


GAAGTTGGCG 


1900 


GATTCCACAA 


GCTGAAACCT 


TG7CACACAA 


GAGATATTAA 


GC7TGCTGGT 


GCAGAAGAAT 


1860 


GGACGGAAAA 


GTATGTGGGT 


TTATTGGCTC 


AAAACTACCA 


TTTATCATCA 


AAAATGTCCA 


1 ^2 0 


actaottggt 


TCAAAACTAC 


G G AA Z C CGTT 


CCTCTATGAT 


TTGCGAA7TT 


TTCAAAGAA7 


1 jcQ 


2 3 AC GG AAAA 


TAAACTGCCT 


TTG7CCT7AG 


CGGAGP-AGGA 


AAATAACGT^ 


AICTACTCTA 


2 J 4 0 




CAACTCGGTC 


AA T T T T i «j AT .* 


C777CAGA7A 


TCCATTCACA 


ATCGGT GAG T 


2 1. J 0 


AAA. GT ATT C 


C AT G C A.GT A C 


GAATATTGTA 


GAACTCCCTT 


GGACTTCCT T 


TTAAGAAGAA 


2 160 


2AAGA7T CG 3 


v_ TT ~ TT juav 


GCC AA G G AA 2 


TTTTGrv-v - o - 


CGTGCATGCG 


AC 3 G T C AAA Z 


"> ? 
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TTAT3GG7GA TGAG7 TCAAT T3GTCGGAGA AAAAGAGGCA G7GG3AACTT GAAAAAACTG 223C 

TGAACTTCAT CCAAGGACGT TTCGGTGTCT AAATCGATCA TGATAGTTAA GGGTGACAAA 2 34C 

GATAACA7T0 ACAAGAG7AA TAATAATGGT AATGATGATA A7AA7AATAA TGATAGTAAT 2 40C 

AACAATAATA ATAA7GG7GG TAAT3GCAAT GAAATCGCTA TTAT7ACCTA TTTTCCT TAA J4t,l 

TGGAAGAGTT AAAGTAAACT AAAAAAACTA CAAAAATATA TGAAGAAAAA AAAAAAAAG A 2:2C 

GGTAATAGAC TCTACTACTA CAATT jATCT TCAAATTATG ACCTTCCTAG TGTT7ATATT .JdBC 

CTATTTCCAA TACATAATAT AATCTATATA ATCATTGCT3 G7AGACTTCC GTTTTAATA7 2 54 0 

CGTTTTAATT ATCCCCTTTA TCTCTAGTCT AGTTTTATCA TAAAATATAG AAA C ACT AAA 2700 

TAATATTCTT CAAACGGT CC TGGTGCATAC GCAATACATA TTTATGGTGC AAAAAAAAAA 2 7 6C 

ATGGAAAATT TTGCTA3TCA TAAAC2CTTT CATAAAACAA TACGTAGACA TCGCTACTTG 2 r J 2 C 

AAATTTTCAA GTTTTTATCA GATCCATGT7 rCCTAT'JTGJ CTTGACAACC TCATCGTC3A 2 3 & C 

AATA3TACCA TTTAGAACGC CCAATATTCA CATTGT3TTC AAGGTCTTTA TTCACCAGT 2 294 C 

ACGTGTAATG GCCATGATTA ATGTCCCTGT ATGGTTAAC 2 ACTCCAAATA GCTTATATTT J-DOl- 

CATA3TGTCA T7GTTTTTCA A7ATAATGT? TAGTATCAAT GGATATGTTA CGACGGTGTT 3360 

ATTT7TCTTG G7CAAATCCT AATAAAATC7 2GATAAATGG AT G AC TAA 3 A TTT7TGGTAA 5120 

AGTTACAAAA T7TATCGTTT TCACT3TTGT 2AATTT7TTG TTCTTGTAAT C ACT C GAG 317 8 

(2i INFORMATION FCR 5EQ ID ND:£: 

(l) SEQUENCE CHARACTERISTICS: 

IA) LENGTH: 816 base pans 
;B) TYPE: nucleic acid 
;C) STRATIDEDNESS: single 
(D) TOPOLOGY: linear 

iii} MOLECULE TYPE: DKA ( genomic i 

i v.i ) ORIGINAL SOURCE : 

(A) ORGANISM: GPP1 

:xi- SEQUENCE DESCRIPTION: SEC 2D NO : 8 : 

A7GAAACGTT rCAATGTTTT AAAATATATC AGAACAACAA AAGCAAATAT ACAAACCATC 6 0 

GCAATGCCTT TGACCACAAA AC CTTTATCT TTGAAAATCA ACGCCGCTCT ATTCGATGTT : 2 n 

GACGGTACCA 7CATCATC7C TCAACCAGCC ATTGC7GC7T TCTGGAGAGA TTTCGGTAAA IfcO 

GACAAGCCTT ACTTCGATGC CGAACAC3TT ATTCACATCT CTCACGGTTG GA3AACTTAC 24 0 

3AT3CCATTG CCAAG7TCGC TCCAGACTTT GC7GA7GAAG AATACGTTAA CAAGC7A3AA 300 

3GTGAAAT2C CA3AAAA3TA :GGT3AACAC TCCATCGAAG T T C2AGGTGC T3TCAAGTTG 360 

:T-TAAT727T TGAA3GCCTT GC3AAAG3AA AAAT3 ^GCTG TC 3 2CACCTC T3GTACCCGT 4 L j 

5A2AT3727A AGAAATGGTT C3ACATTTT 7 AAGATCAAGA 7A22A7AATA 2TT2AT2AC2 1 : : 

GCCAATGATG 7 2AA3CAAG2 7AAGCCTCA~ CCAGAACCAT ACT7AAAGG3 TAGAAA23GT 547 
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TTGGG^TTCC CAATTAATGA ACAAGACCCA TGCAAATCTA I 



GCACCAGCTG GTATTGCTGC TGGTAAGGC? GCTGGCTGTA AAATCGTTG3 TATTGCTACC 560 
ACTTTCGATT TCGACTTCTT GAAGGAAAAG G 3TTGTGACA TCATTGTCAA GAACCACGAA 7 2C 
TCTATCAGAG TCGGTGAATA CAAC3CTGAA A JCGAT3AAG TCGAATCGAT CT TTGACGAC 76C 



INFORMATION FC? SEQ ID NO: 9: 

[i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 753 oase pairs 

(B) TYPE: nucleic acid 

(C) STRAN0EDNE5S: single 
(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DKA (genome) 

(vi i ORIGINAL SOURCE: 

(A) ORGANISM: GPP2 

(xii SEQUENCE DESCRIPTION: SEQ ID NO : S : 



ATGGGATTGA CTACTAAACC TCTATCTTTG AAA GTTAACG CCGCTTTGTT CGACGTCGAC 60 

GG7ACCATTA TCATCTCTCA ACCAGCCATT GCT3CATTCT GGAGGGATTT CGGTAAGGAC 12 U 

AAACCTTATT TCGA7GC7GA ACACGTTATC 3AA3TC7CGC ATGGTTGGAG AACGTTTGAT 130 

GCCATTGCTA AGTTCGCTCC AGACTT7GCC AATGAAGAGT ATGTTAACAA ATTAGAAGCT 240 

GAAA7T XGG 7CAAGTACG3 TGAAAAATCC ATTGAAGTCC CAGGTGCAGT 7AAGCTGTGC 3 30 

AACGCTTTGA ACGCTCTACC AAAAGAGAAA TG3GCTGT3G CAACTTCCGG TACCCGTGAT 300 

ATGGCA2AAA AATGGT7CGA GCATCTGG3A AT C AG GAGA 1 CAAAG7ACTT CATTACCGCT 420 

AATGAT 3TCA AACAGGGTAA 3CC7CATC7A GAACCATA7C TGAAGGG CAG GAATGGCT TA 480 

ggatat:cga tcaatgagca agaccctt:: aaatctaagg tagtagtatt tgaagacg:t 54 c 

CCAGCAGGTA T T 2CCGCC3Z- AAAAGCCGC "! GG7TG7AAGA TCATTGGTAT TGCCACTA3T 60C 

TTC3AC7TGG A3T7CCTAAA GGAAAAAGGC TG7GACATCA TTGTCAAAAA CCAC3AA7CC 660 

ATCAGAGTTG GC3GC FACAA TGCCGAAACA GACGAAGTTG AATTCATTTT TGA^3AC7AC 72C 

TTA7ATGCTA A03AC3ATCT GTTGAAATGG TAA "53 
(2) INFORMATION FOR SEQ 7D NO: 10: 



11) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 2 52C base pairs 
(p. i TYPE : nucleic acid 
(Ci S T RAN D E D N E 51 S : s-.r.gle 
i Z ) TO PC LOGY: linear 

MOLECULE TYPE: DKA i^er.cir.ici 



TACT TAT ACG CTAAGGA7GA CT7GTTGAAA TGGTAA 
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(xi) SEQUENCE CESCRIPTION: 3EQ I'D NO : 1J : 
TGTATTGGCC ACGATAACCA CCCTTTGTAT ACTGTTTTTG TTTTTCACAT GGTAAATAAC 6f. 

GACTTTTATT AAACAACGTA TGTAAAAACA TAACAAGAAT CTACCCATAC A3GCCATTTC 120 

GT.AAT7CTTC TCTTCTAATT G3AGTAAAAC CATCAATTAA AGGGTGTGGA GTAGCATAGT IF?' 

3A3GGGC7GA CTGCATTGAC AAAAAAA7TG AAAAAAAAAA A 3GAAAAGGA AAGGAAAAAA 2<;l 

AGACAGCCAA GACTTTTAGA ACGGATAAGG T3TAATAAAA TGTGGGGGGA TCCC7GTTCT :;0( 

CGAACCATAT AAAATATACC A7GTGGTTT3 A3TTGTGGCC GGAACTATAC AAATAGTTAT 36-! 
atgtttccct CTCTCTTCCG ACTT3TAGTA ttctccaaac GTTACATATT CCGA7C.AA.GC 

CAGCGCC7TT ACACTA3TTT AAAACAAGAA CAGAGCCGTA TGTCCAAAAT AA7GGAAGA7 4b:, 

TTAC3AAGT3 ACTACGTC 3C GCTTATCGCC AG T ACT GAT G TAGGAACGAC TCCATCCAGA 5 4 =J 

TGCATT3T 37 TCAACAGA7G GGGCCAGGAC GCTTCAAAAC ACCAAATTGA A7ATTCAAC7 60-.' 

TCACCAT:3A AGGGCAAGAT TGGGGTGTC7 G3CCTAAGGA GACCCTCTAC AGCCCCAGCT 660 

CGTGAAACAC CAAACGCCGG TGACATCAAA ACCAGCGGAA AGCCCATCT? TTCTCCAGAA 7 M 
GGCTATGCCA T7CAAGAAAC 3AAATTCCTA AAAATCGAGG AATTGGACT7 3GACTTCCAT 

AACGAACCCA CGTTGAAGTT 3CCCAAACCG GGT7GGGTTG AGTGCCATCC 3CA3AAATTA 8 4 0 

ctgg7gaacg tcg7ccaatg cct7gcctca agtttgctc7 ctc7gcagac fatcaacagc 90.) 
3aacctgtag :aaacggt:t cc:accttac ^ggtmtat scatgggtat agcaaacatg 

agagaaac:a :aattctgt3 3TCc:gc:g: acaggaaaac caattgttaa ztacggtact i o u j 

3tttggaa:g acaccagaac gatcaaaacc gttagaga:a aatgggaaaa :a:cagcg7c ioeo 

GATAGGCAAC 7GCAGCTTAG ACAGAAGACT GGATTGCCAT TGCTCTCCAC GTATTTCTCC 114 0 

TGTTCCAA33 7G ZGCTGG7T 3CTCGACAAT GAGCC7CTGT GTACCAAGGC 3 TAT GAG GAG 1200 

AAC3ACCT3A 7 j TTCGGCAC TG7GGACACA TGGCTGATTT ACCAATTAAC TAAACAAAAG 12 0 J 

GCGTT3GTTT CTGAXTAAC CAAC3CTT 3C AGAAC7GGAT TTATGAACCT CT 2CACT PTA 132) 

aagtacga:a a:gagttg:t 33aatttt:g cg7atcca:a agaacctgat tcacatg ccc 13-.0 

3AAATT 3T 3T :CCCATCTCA ATACTAC jGT GACTT7 33CA TTCCTGATTG GA7AATG3AA 14-30 

^ACCTACACG ATTC jCCAAA AACA3TA2TG CGAGATCTAG TCAAGAGAAA CC7GCCCATA 1500 

GAGGGCTG7C TGGG GGACCA AAGCGCA7CG ATGGTGGGGC AACTCGC7TA CAAA7CC3GT lb 60 

GCT GC AAAAT GTAC7TA7GG TACCGGTCCC T7TTTAC7GT A3AATACGG3 GA3CAAAAAA 162 j 

TTGATCTCrC AACA7GGC3C ACT GACGAC 1' 37AGCATTTT G3TT 2C3ACA TTTG 2AA3AG lO'JO 

CACCG7CC:C AAAAACCA ^A A7T3AGCAAG CCA3ATT7T3 CATTAGAGGG T72C3TC3CT 17 10 

GT3 3C7GGTG CT3TGGTCCA A7GGCTACG7 ^ATAATTTAC GATTGA I'CGA TAAA2CAGA3 1£-3C 

3A73TCGGAC C3AT7GCATC TAC3GT7CC7 GATTCTG3TG GC3TAGTTT7 CGTCCCCGCA 1c6l 

7 T7AGTGGCC TATTCGC7CC C7A7T GGGAC 2 GAG AT CCC A CAGCCACCA7 AA73GGGATG i ?2 ~ 
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TCTCAATTCA CTACTGCCTC CCACATCGCC AGAGCTGCCG TGGAAGGTGT TTGCTTTCAA 1980 

3CCAGSGCTA TCTTGAAGGC AATGAGTTCT GACGCGTTTG GTGAAGG7TC CAAAGACAGG 20 4 0 

GACTTTTTAG AGGAAATTTC CGACGTCACA TA7GAAAAGT CGCCCCTGTC GGTTCT GGCA 2100 

GTGGATGGCG GGATGTCGAG GTCTAATGAA GTCATGCAAA TTCAAGCCGA TATCCTAGGT 2160 

CCCTG7CTCA AAGTCAGAAG GTGTCCGACA GCGGAATGTA CCG CAT TGGG G3CAGCCATT 2220 

GCAGCCAATA 7GGCT7TCAA GGATG7GAAC GAG C GO C CAT TATGGAAGGA C:TACA3GA7 22 3 0 

GTTAAGAAAT GGGTCTTTTA CAATGGAATG GAGAAAAACG AACAAATATC ACCAGA3GC7 2340 

CATCCAAACC 7TAAGATA7T CAGAAGTGAA TCCGACGATG CTGAAAGGAG AAAGCATTGG 2400 

AAGTA7TGGG AAGTTGCCGT GGAAAGATCC AAAGGTTGGC TGAAGGACAT AGAAGG TGAA 2460 

GACGAACAGG TTCTAGAAAA CT7CCAATAA CAACATAAAT AATT7CTATT AACAAT3TAA 2320 
(2) INFORMATION FCR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 391 amino acids 

(B) TYPE: amine acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD1 

(xi) SEQUENCE DESCRIPTION : SE2 I C NO: 11: 

Met Ser Ala Ala Ala Asp Arq Leu Asn Leu Thr Ser Glv His Leu Asn 
15 1C 15 

Ala Gly Arg Lys Arg Ser Ser Ser Ser Val Ser Leu lys Ala Ala G-U 
20 2 5 30 

Lys Pro ?he Lys Val Thr Val He Glv Ser Gly Asn Trp Gly Thr Tnr 
35 4C 45 

He Ala Lys Val Val Ala Glu Asn Cys Lys Gly Tyr Pre Glu Val rne 
5C ^ 55 6C 

Ala Pro lie Val Gin Met Trp Val Phe Glu Glu Glu lie Asn Gly Glu 
65 ~H ' 75 30 

Lvs Leu Thr Glu He lie Asn Thr Arc His Gin Asn Val Lys Tyr Leu 

65 90 95 

Pro Gly lie Thr Leu Fro Asp Asn Leu Val Ala Asn Pro Asp Leu lie 

ioo 105 n: 

Asp Ser Val Lys Asp Val Asp lie He Va". Phe Asn He Pre ills Gin 
111 ' 120 12. 

Phe Leu Pro Arg He Cys Scr Gin Leu Lys Gly His Val Asp Ser His 

. -. ^ - r i • n 

1 .} U ^ • - • A i u 

Val Arc: Ala He Ser 2vs leu lvs Gly Pne Glu Val Gly Ala lys Gly 
' 4 = 150 1-5 150 
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Val Gin Lou Leu Ser Scr Tyr lie Thr Glu Clu Leu Gly Tie Gin Cys 
165 HC 175 

Gly Ala Leu Ser Gly Ala Asn lie Ala Thr Glu Val Ala Gin Glu His 
18 3 155 190 

Trp Ser Glu Thr Thr Val Ala Tyr His lie Fro Lvs Asp Phe Arq 31 v 
195 2CD 205 

Glu Gly Lys Asn Val Asp His Lys V-j.1 Leu Lys A_a Leu Phe His Arq 
210 '215 220 

?rc Tyr Phc His Val Ser Val He Glu Asp Val Ala Gly lie Ser lie 
225 230 235 240 

Cvs Gly Ala Leu Lys Asn Val Val Ala Leu Gly Cys Gly Phe Val 31u 
245 250 255 

Gly Leu Gly Tro Gly Asn Asn Ala Ser Ala Ala He Gin Arg Val Gly 
260 * 265 270 

Leu Gly Glu He He Arg Phe Gly Gin Met Phe Phc Pre Glu Ser Arg 

2^5 280 285 

Glu Glu Thi Tyi Tyr Gin Glu Ser Ala Gly Val Ala Asp Leu He Thr 
290 295 3C0 

Thr Cys Ala Gly Gly Arg Asn Val Lys Val Ala Arq Leu Met Ala Thr 
305 310 315 320 

Ser Glv Lys Asp Ala Trp Glu Cys Glu Lys Glu Leu Leu Asn Gly Gin 
J2d " 330 335 

Set Ala Gin Gly Leu He Thr Cys Lys Glu Val His Glu Trp Leu Glu 

340 345 350 

Thr Cvs Gly Ser Val Glu Asn ?he Pro Leu Phe Giu A^a Val Tyr Gin 
355 360 365 

lie Val Tyr Asn Asn Tyr Pro Met Lys Asn Leu Pro Asp Met He Glu 

3 70 37 5 3 30 

Glu Leu Asp Leu His Glu Asp 
385 " 390 

[2) INFORMATION FOR SEC ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS ; 

(A! LENGTH : 364 amino acids 
( B TYPE: ammo acid 
(C- STRAUDECNESS : unknown 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi! ORIGINAL SOURCE: 

( Ai r»PGANTSM: GFD2 

ixH SEQUENCE DESCRIPTION: SE? ID MO: 12: 

f-'et Thr Ala His Thr Asn He Lys Gin His Lys His 2ys His Giu Asr 
1 5 10 1 : 

His Pio He Aru Arq Ser Asp Ser Ala Val Ser He Val His Leu Lys 
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Arc Ala Pro Fhe Lys Vai Tnr Vai lie Giy Ser Giy Asn Trp Giy Thr 
35 40 45 

Thr lie Ala Lys Vai lie Ala 31 u Asn Thr Glu Leu His Ser ins He 

50 ' = 5 60 

Phe Glu Pro Glu Vai Arg Met Trp Vai Phe Asp Glu Lys He Giy Asp 

65 1 0 75 8C 

G_u Asn Leu Thr Asp He lie Asn Thr Arg His Gin Asn Vai Lys Tyr 
35 9C 95 

Leu Pro Asn He Asp Leu Pro His Asn Leu Vai Ala Asp Pro Asp Leu 
130 105 110 

Leu His Ser He Lys Glv Ala Asp lie Leu Vai Phe Asn He- Pro His 
115 120 125 

Gin Phe Leu Pro Asn He Vai Lys Gin Leu Gin Giy His Vai Ala Pro 
130 135 140 

His Vai Arg Ala lie Ser Cys Leu Lvs Giy Phe Glu Leu Giy Ser Lys 
145 150 155 160 

Giy Vai Gin Leu Leu Ser Ser Tyr Vai Thr Asp Glu Leu Giy He Gin 
165 ~ 170 175 

Cys Giy Ala Leu Ser Giy Ala Asn Leu Ala Pro Glu Vai Ala Lys Glu 
18C 185 190 

His Trp Ser Glu Thr Thr Vai Ala Tyr Gin Leu Pre Lys Asp Tyr Gin 
195 200 205 

Giy Asp Glv Lys Asn Vai Aso ills Lys He Leu Lys Leu Leu Phe His 
2 1 C " ' ' 215 220 

Arg Pro Tyr Phe His Vai Asn Vai He Asp Aso Vai Ala Giy lies Ser 
225 23C 235 240 

lie Ala Glv Ala Leu Lvs Asn Vai Vai Ala Leu Ala Cys Giy Phe Vai 
245 " 25C 255 

Glu Giy Met Giy Trp Glv Asn Asn Ala Ser Ala Ala He Gin Arg Leu 

2 60 2G5 2~ ? 0 

Civ Leu Giy Glu He He Lys Phe Giy Arg Met Phe Phe Pre Glu Ser 

2 75 280 26 5 

Lys Vai Glu Thr Tyr Tyr Gin Glu Ser Ala Giy Vai Ala Asp Leu He 
290 ' 295 300 

Thr Thr Cvs Ser Giy Giy Arq Asn Vai Lys Vai A- a Thr Tyr Met Ala 
505 310 315 32C 

Lvs Thr Giy Lys Ser Ala Leu Glu Ala Glu Lys Glu Leu Leu Asn Giy 
325 337 j35 

Gin Ser Ala Gin Giy He lit Thr Cys Arg Glu Vai His Glu Trp Leu 

3 4 0 • 4 5 3 5''; 

G.n Thr Cys Glu Leu Tnr G_n Glu Fhe Pro He He Aro Giy Ser Lea 

3 5 r. 3 60' 6 1: 

Pre Asp Ser Leu Gin Gin Arc Fro His Giy Arc Pro Thr Giy Asp Asp 
370 37= " 360 
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;2i INFORMATION FOR 3EQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 614 amino ac.as 
(E) TYPE: amino acid 
(C) STRANDEDNESS: unknown 
iU) TOPOLOGY: unknown 

MOLECULE TYPE: protein 

(vi; ORIGINAL SOURCE: 

(A) ORGANISM: GUT 2 

(xi! SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Thr Ara Ala Thr Trp Cys Asn Ser Fro Pro Pro Leu His Arc Gin 
1 5 10 15* 

Val. Ser Ara Arg Asp l.eu Leu Asp Arg Leu Asp Lys Thr His GIr. Fhe 

20 ' 2b 30 

Asp Val Leu lie lie Gly Gly Gly Ala Thr Gly Thr Gly Cys Ala Leu 
35 4 0 4 5 

Aso Ala Ala Thr Arg Glv Leu Asn Val Ala Leu Val Glu Lys Gly Asp 
50 55 50 

Phe Ala Ser Glv Thr Ser Ser Lys Ser Thr Lys Met He His Gly Gly 
65 70 ^5 80 

Val Arg Tyr Leu Glu Lys Ala Phc Tro Glu Fhe Ser Lys Ala Gin Leu 
8 5 90 9 5 

Aso Leu Val He Glu Ala Leu Asn Glu Arq Lys His Lea lie Asn Thr 
100 105 110 

Ala Pre His Leu Cys Thr Val Leu Pro He Leu He Pro lie Tyr Ser 
115 120 125 

Thr Trp Gin Val Pro Tyr He Tyr Met Gly Cys Lys Phe Tyr Asp ?he 
13C " 135 14 0 

Phe Gly Glv Ser Gin Asn Leu Lys Lys Ser Tyr Leu Leu Ser Lys Ser 
14 5 * 150 155 160 

Ala Thr Val G_u Lys Ala Fro Met Leu Thr Tnr Asp Asn Leu Lys Ala 
165 170 175 

S^r Leu Val Tvr His Asp Glv Ser Phe Asn Asp Ser Arq Leu Asn Ala 
180 ' H5 ' 19C 

Tnr Leu Ala He Thr Gly Val Glu Asn Gly Ala Thr Val Lei: He Ty~ 
195 200 205 

Vil Glu Val Gin Lys Leu lie Lys Asp Pro Thr Sei Gly Lvs Val He 
210 ' 215 220 

Glv Ala Glu A J a Ara Asn Val Glu Thr Asn Glu: Leu Val Ara He Asr. 
H5 ' 210 035 24l 

Ala Lys Cvs Val Val Asn Ala Thr Gly Fru Tyr Ser Asp Ala He Leu 
24- 250 

Gin Met Asr Arg Asn Pro Ser Glv Leu Pro Asp Ser Pre Leu Asn Asp 

260 ^65 270 
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Asn Ser Lys lie Lys Ser Thr Fne Asn Gin Tie Ser Vai Met Asp Pre 

27 5 23 0 28 : 

Lys Met Val lie Pro Ser lie Gly Val His lie Val Leu Fro Ser Phe 

?9C 295 230 

Tvr Ser Fro Lvs Aso Met Gly leu Leu Asp Val Arg Thr Ser Asp Gly 
305 ' 310 315 32C 

Arc Val Met Phe Phe Leu Pro Trr Gin Gly Lys Val Leu Ala Gly Thr 
325 330 335 

Thr Asp Lie Pro Leu Lys Gin Val Pro Glu Asn Pre Met Pre Thr Gli: 
34C 345 3SC 

Ala Asp lie Gin Asp lie Leu Lys Glu leu Gin His Tyr lie Glu Phe 
355 36C 365 

Pre Val lys Ary Glu Asp Val Leu Ser Ala Trp Ala Gly Val Arg Pro 
370 375 380 

leu Val Arc Asp Pre Arg Thr He Fro Ala Asp Gly Lys Lys Gly Ser 
385 390 395 4 CO 

Ala Thr Gin Gly Val Val Arg Ser His Phe leu Phe Thr Ser Asp Asn 
405 410 415 

Glv Leu lie Thi He Ala Gly Gly lys Trp Tnr Thr Tyr Arq Gin Met 
420 425 430 

Ala Glu Glu Tnr Val Asp Lys Val Val Glu Val Gly Gly Phe His Asn 

4 3 5 4 4 0 4 4 5 

Leu lys Pro Gys His Thr Arg Asp He Lys Leu Ala Gly Ala Glu C-iu 
450 455 460 

Trrj Tnr Gin Asn Tyr Val Ala Leu Leu Ala Gin Asn Tyr His Leu Ser 
465 4"0 475 430 

Ser lys Met Ser Asn Tyr Leu Va. Gin Asn Tvr Gly Thr Arq Ser Ser 
4^5 490 495 

Lie He 2ys Glu Phe p he Lys Glu Ser Met Glu Asn Lyr, Lou Pro Leu 
■jQC 50 5 5 1 0 

Ser Leu Ala Asp Lys Glu Asn Asn Val lie Tyr Ser Ser Glu Glu Asn 
c 15 J 52C 52 5 

Asn Leu Vai Asn Fne Asp Thr Phe Arq Tyr Fro Phe Thr He Gly G.u 
530 535 540 

leu Lys Tyr Ser Met Gin Tyr Glu Tyr Cys Arg Thr Fro Leu Asp Phe 
545 550 555 560 

leu Leu Arc Arg Thr Arc Phe Ala Fne Leu Asp Aid lys Glu Ala Leu 
-65 57 0 5^5 

Asn Aid Vai His Ala Tnr Val Lys Val Met bly Asp Glu Phe Asn ~rp 

5-0 : 5 9 - 

Sei Glu Lys lys Arg Gin Trp Glu leu Gi s lys Thr Val Asn The lie 

5 9 5 -"■ i £ ' 

Sir. Gly Arg Pne Gly Vsl 
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<2) INFORMATION FOR SEQ - D NO: 14: 

!i) SEQUENCE CHARACTERISTICS : 

(A: LENGTH: 339 amine acids 
(B: TYPE: ammo acid 
( Z ; SCRAN DEDNESS : unknown 
(D) TOPOLOGY: unknown 

(li) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: GPSA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asn Gin Arg Asr. Aid Ser Met Thr Val lie Gly Ala Gly Ser Tyr 

1 5 10 15 

Glv Thr Ala Leu Ala He Thr Leu Ala Arg Asr. Gly His Glu Val Val 
2C 25 30 

Ten Trp Gly His Asp Pro Glu His lie Ala Thr Leu Glu Arg Asp Arg 
" 35 * 4 J 4 5 

Cvs Asn Ala Ala Phe Leu Pro Asp Val Pro Phe Pro Asp Thr Leu His 
5C 55 60 

Leu Glu Ser Asp Leu Ala Thr Ala Leu A.la Ala Ser Arc Asn He Leu 
55 ~ 70 75 80 

Val Val Val Pro Ser His Val Phe Gly Glu Val Leu Arc Gin He Lys 
85 J 90 95 

Pre- Leu Met Arg Pro Asp Ala Arg Leu Val Trp Ala Thr Lys Glv Leu 
100 '.C5 no 

Glu Ala Glu Thr Gly Ary Leu Leu Gin Asp VH Aid Ary Glu Ala Leu 
115 ^ " 12C 125 

Gly Asp Gin He Pro Leu Ala Val He Ser Gly Pro Thr Phe Aid Lys 
130 135 140 

Glu Leu A! -i Ala Gly Leu Fro Thr Ala He Ser Leu Ala Ser Thr Asp 
145 ~ 150 155 160 

Gin Thr Phe Ala Asp Asp Leu Gin 2-ln Leu Leu His Cys Gly Lys Ser 
16*5 ' 170 175 

Phe Arq Val Tvr Ser Asn Pre Asp Pne He Gly Val Gin Leu Gly Glv 
180 185 190 

AH Val l ys Asn Val He Ala He Gly Ala Gly Met Ser Asp Gly He 
195 2C0 205 

Gly Phe Gly Ala Asn Ala Arg Thr Ala Leu He Thr Arc Sly Leu Ala 
-> - r - H 220 

Glu Met Ser .Arg Leu Gly Ala Ala Leu Gly Ala Asp Prt A_a Thr Pne 

2 25 230 23 5 2 4 0 

Met jiy Met A I a Glv i eu He A>p I. en Val Lei: Thr Cvs : h r Asp Asn 
24 V 250 LH 

Gin Ser Arc Asn Ar z Arc Phe Glv Met Met Leu Gly Gin Sly Met Asp 
2 60 265 
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!i) SEQUENCE CHARACTERISTICS : 

(A! LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) ST HANDEDNESS : single 

(D) TOPOLOGY: linear 

(li; MOLECULE TYPE : IN A (qer.onici 

ixi SEQUENCE EESC? I PTION : SEQ ID NO: 23: 

GGCCAA3CTT AA3GAGGTTA ATT AAA7GAA AAG 33 

(2j INFORMATION FCR SEQ ID NO: 24: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii: MOLECULE TYPE : DNA (genomic) 

ixi; SEQUENCE FESCP 1 ?T1 ON : SEQ ID NO : 2 4 : 

3CTCTAGATT ATTCAATGGT GTCGGG 2b 

(2! INFORMATION FOR SFQ ID NO: 25: 

(il SEQUENCE CHARACTERISTICS: 

;a; LENGTK: -32 base pairs 
;B) TYPE: nucleic acid 
\C) i'TRANDEDNESS: single 
[D" TOPOLOGY. linear 

;ii' MOLECULE TYPE: CNA (genomic) 

ixi: SEQUENCE DESCRIPTION: SEQ ID NO: 2c: 

GCGCCGTCTA GAATTATGAG CTATCGTATG TTTGATTATC TG 4 2 

■.2} INFORMATION EOF SE 2 ID NO : 2 0 : 

( i SEQUENCE CHARACTERISTICS: 

( A 1 LENGTH: 36 base pairs 
(3) TYPE: nucleic acid 
(C) ST PAN DECK ESS : single 
( 31 TOPOLOGY: linear 

;iii KOLEC'JwE TYPE. DNA iqer.cmic) 

;xii SEQUENCE DESCRIPTION : SEQ ID NO :2b: 

7CTGATACGG GATCCTCAGA A1GCCTGGCG GAAAAT 3t 

[2) INFORMATION FC? SEQ ID NO: 27: 

, : 1 2 EQU EN OF CHARACTER! JT I -N : 

(A! LENGTH: ci base pairs 
,3' TYPE: add 
,2' STRANDED\'ESS : single 
T' TCT'T-OGY : 1 mear 

,ii) MOLECULE TYPE DNA ;.:erioni: 
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SEQUENCE DESCRI FTICN : 



SEQ ID NO: 2 



GCGCGGATCC AGGAGTCTAG AATTATGG3A TTGACTACTA AACCTCTATC T 
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(2: INFORMATION FCF SEC - D NO : 28 : 

( i ) SEQUENCE CHARACTERISTICS : 

(Ai LENGTH: 3 6 base pairs 

(B) TYPE: nucleic acic 

( Z) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA [genomic) 

(xi) SEQUENCE DESCRTFTTON: SEQ ID NO: 28: 

GATACGCCCG GGTTACCATT T CAACAGATC GTCCTT 3 6 

(2] INFORMATION FOP SEQ ID NO: 29: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 
{?.} TYPE: nucleic acta 
(0) SI HANDEDNESS : single 
C) TOPOLOGY: linear 

(n) MOLECULE TYPE: SNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 29: 

TCGACGAATT CAGGAGGA IB 

{2: INFORMATION FOR SEQ ID KD:3'J: 

;i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : IS b^se pairs 
(L J ) TYPE: nucleic acid 

(C) STRANDEONE3S: single 
it) TOPOLOGY: linear 

(ii) MOLECULE TYPE' LNA (aenomic! 

(xi) SEQUENCE DESCRIPTION : SEQ IE NC:30: 

CTAGTCCTCC TGAATTCG 13 

(2; INFORMATION FOR SEQ ID KO:31: 

;i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 Case pairs 
[ r ) TYPE: nucleic acid 
:.:) FTRANDEDNESS: single 

(D) TOPOLOGY: linear 

ill) MOLECULE TYPE: DNA l generic 

iXil SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

CCAG'TAAGGA CGACAATTC 19 

;2i INFORMATION FOR SEQ ID NO: 32: 

.1. SEQUENCE CHARACTER! ST I 20 : 

;A) LENGTH: 19 case pairs 
[b) TYPE: nviz'-eic acid 
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;C} STRANL:ELNESS : single 
;:) TOPOLOGY: linear 

■li; MOLECULE TYPE: DKA (genome: 

ixi; SEQUENCE INSCRIPTION: SEQ ID NO: 32: 

:ATGGAATTG TCCTCCTTA 

2; INFORMATION FOR SEC 1 ID NO: 33: 

(il SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 amino acids 
i TYPE: amino acic 
i STRANCEDNESS : unknown 
:D) TOPOLOGY: unknown 

■ii; MOLECULE TYPE: protein 

;vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP1 

Ixi] SEQUENCE DESCRIPTION: 5EQ ID NO: 33: 

Men Lys Arg ?ne Asn Vai Leu Lys Tyr Tie Arg Thr Thr i,ys Ala Asn 
1 S iu 15 

lie Gin Thr lie Ala Met Pro Leu Thr Thr Lys Pro Leu Ser Leu Lvs 

20 25 30 

lie Asn Ala Ala Leu Phe Asp Vai Aso Gly Thr lie lie lit Ser Gin 

35 4 0 4 5 

Pro Ala lie Ala Ala Phe Trp Arc Asp Phe Gly Lys Asp Lys Pro Tyr 
50 55 60 

Phe Asp Ala Glu His Vai lie His lie Ser His Gly Trp Arc Thr Tyr 

65 70 75 SO 

Aso Ala lie Ala Lys Phe Ala Pro Asp Phe Ala Asp Glu Glu Tyi Vai 

85 90 95 

Asn Lys Leu Glu Gly Glu lie Pro Glu Lys Tyr Gly Glu His Ser l_e 
10C l'J5 111 

Glu Vai Pro Gly Ala Vai Lys Leu Cys Asn Ala Leu Asn Ala Leu Pro 
115 12 0 12 5 

Lys Glu Lys Trp Ala Vai Ala Thr Ser Gly Thr Arg Asp Met Ala Lys 
130 ' 135 140 

Lys Trp ?ne Asp lie Leu Lys lie Lys Arq Pre Glu Tyr Phe lie Thr 
145 150 155 160 

Ala Asn Asp Vai Lys Gin Glv Lvs D rc Hi^ Pro Glu Pre Tyr Leu Lys 
165 170 1'5 

31v Aii Asn Gly Le - Gly PL.- ?r . He Asn 3l_ Gin Asi So: l' ; : 

:eo :55 

Ser Lys Lai Vai Vai Phe ,-i.u Asp Ala Pro Ala Glv lie Ala A~o G l y 

1?L : 2 JO 2 L ; 

Lys Ala Ala Gly Cy* Lys lie V*l Sly lie Ala Tnr Thr r h- Asp ie.: 
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Asd Phe Leu Lvs Glu Lys Gly Cys A*p lie He Val Lys Asn His Glu 

225 23C 255 24 C 

Ser lie Arg Val Gly Glu Tyr Asn Ala Glu Thr Asp Glu Val Glu Leu 
245 25C 255 

lie Phe Asp Asp Tyr Leu Tyr Ala Lys Asp Asp Leu Leu Lys Trp 

260 265 27C 

[2] INFORMATION FOR SEQ ID NO: 34: 

11) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 55 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNE3S: unknown 
(C) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAE1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Lys Arc Ser Lys Arg Phe Ala Val Leu Ala Gin Arg Pro Val Asn 
1 5 IG 15 

Glu Asp Gly Leu lie Glv Glu Trp Pro Glu Giu Gly Leu He Ala Met 
2C 25 30 

Asp Ser Pro Phe Asp Pro Val Ser Ser Val Lys Val Asp Asn Gly Leu 
35 40 45 

lie Val Glu Leu Asp Civ Lys Arg Arc Asp Gin Phe Asp Met He Asp 
50 * 55 60 

Arc Phe He Ala Asp Tyr Ala He Asn Val Glu Arg Thr Glu Gin Ala 
65' " 70 75 80 

Met Arg Leu Glu Ala Val Glu lie Ala Arq Met Leu Val Asp lie His 

85 90 95 

Val Ser Arg Glu Glu He He Ala He Thr Thr Ala He Thr Pro Ala 

10C 1C5 110 

Lys Ala Val Glu Val Met Ala Gin Met Asn Val Val Glu Met Met Met 

115 12C 125 

Ala Leu Gin Lvs Met Arc Ala Arc Arg Thr Pro Ser Asn Gin Cys His 
13C ' 135 14C 

Vai Thr Asn Leu Lys Asp Asn Pro Val Gin He Ala Ala Asp Ala Ala 
145 150 15b 160 

Glu Ala Glv He Arc Sly Phe Ser Hu Gin Glu Thr Thr Val Gly He 
165 170 175 

Ala Arc Tyr Aia Pro Phe Asn Ala leu Ala Leu Leu Va. Gly Ser Gin 

180 1S5 190 



Cys o ± y M.r c t-ro o^y vai L-eu . r.r oi: 



:er Val Giu Giu Aia In: 



,1-j Leu Glu Leu Glv Met Arc Glv Leu 7hr Ser Tyr Ala Giu Thr Vai 
21C 215 220 
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Ser Val Tyr Gly Thr Glu Ala Val Phe Thr Asp Gly Asp Asp Thr Pre 
2H ' 23*5 24 C 

Tip Ser Lys Ala Fhe Leu Aid 3er Ala Tyr Ala Sor Arg Gly Leu Lys 
245 250 255 

Met Arc Tyr Thr Ser Gly Thr Gly Ser Glu Ala Leu Met Glv Tyr Ser 

260 105 270 

Glu Ser Lys Ser Met Leu Tyr Leu Glu Ser Arg Cys lie ?he He Thr 
275 ' 28C 285 

Lys Gly Ala Gly Val Gin Glv Leu Gin Asn Gly Ala Val Ser Cys He 
2 90 ' 2 95 30C 

Glv Met Thr Gly Ala Val Fro Ser Gly He Arg Ala Val Leu Ala Glu 
305 310 315 32C 

Asn Leu lie Ala Ser Met Leu Asp Leu Glu Val Ala Ser Ala Asn Asp 

325 " 330 335 

Gin Thr Phe Ser His Ser Asp He Arg Arg Tar Ala Arg Thr Leu Met 
343 ' 345 35C 

Gin Met Leu Fro Gly Thr Asp Phe lie Phe Ser Gly Tyr Ser Ala Val 
355 " 360 365 

Pre Asn Tvi Asp Asn Met Phe Ala Glv Ser Asn Phe Asp Aid Glu Asp 
370 " ' 375 380 

Phe Asd Asn Tyr Asn He Leu Gin Arg Asp leu Met Val Asp Gly Gly 
365 * ' 390 ' 295 400 

Lou Arg Pro Val Thr Glu Ala Glu Thr He Ala He Arg Gin Lys Ala 
4 0 5 4 10 41b 

Ala Arg Ala He Gin Ala Val Fhe Arg Glu Leu Gly Leu Pro Pre lie 
420 425 430 

Ala Asp Glu Glu Val Glu Ala Ala Thr Tyr Ala His Gly Ser Asn Glu 
4 35 440 4 45 

Met Pro Pro Arg Asn Val Val Glu Asp Leu Ser Ala Val Glu Siu Met 
4 50 455 4 60 

Met Lys Ara Asn He Thr Sly Leu Asp He Val Sly Ala Leu Ser Arg 
465 ' 4^0 475 480 

Ser Glv Phe Glu Asd He Ala Ser Asn lie Leu Asn Met Leu Arc Hn 
48 5 4 90 4 95 

Arg Val Thr Gly Asp Tyr Leu Gin Thr Ser Ala He Lei: Asp Arg -In 
dOC 505 -310 

Phe Glu Val Val Ser Ala Val Asr. Asp He Asn Asp Tyr Gin Gly Pre 

515 52C 525 

Gly Thr Gly Tvr Arg He Ser A-a Glu Arg Trp Ala Glu He Lys Asn 
:3C 5 35 54 0 

He bro Glv Var Va- Jlr. rro Asp Thr Tie Gi i 
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INFORMATION FCF. SEQ ID MO: 33: 



SEQUENCE CHARACTER! 37 I CS : 



(3 



LENGTH: 194 amine acids 
TYFzl : amino acid 
STRANDEDNESS : unknown 
TO FOLD 3 Y : unknown 



(Li 

MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(Al ORGANISM: DHAE2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Gin Gin Thr Tnr Gin lie Gin Pre Ser Phe Thr Leu Lys Thr Arg 
1 5 10 15 

Glu Gly Gly Val Ala Ser Ala Asp Glu Arq A_a Asp Giu Val Vai lie 

2C 25 3C 

Gly Val Gly Pro Ala Phe Asp Lys His Gin His His Thr Leu He Asp 
35 40 45 

Met Pro His Gly Ala He Leu Lys G^u Leu lie Ala Gly Val Glu Glu 
50 55 60 

G'.u Gly Leu His Ala Arg Val Val Arg lie Leu Arg Thr Snr Asp Vai 
65 ' 7ij 75 8 0 

Ser Phe Met Ala Trp Asp Ala Ala Asn Leu Ser Gly Ser Gly lie Gly 

35 90 95 

He Gly lie Gin Ser Lys Giy Thr Thr Val He His Gin Arq Asp Leu 

ioo :C5 no 

Leu Fro Leu Ser Asn Leu Glu Leu Phe Ser Gin Ala Pro leu Leu Tnr 
115 12C 125 

Leu Glu "Thr Tyr Ara Gin He Giy Lys Asn Ala Ala Arg Tyr Ala Arg 
130 * ' 135 * 140 

Lys Glu Scr Pre Ser Pre Val Pro Vai Val Asn Asp Gin Met Vai Arq 
".45 1 50 155 160 

Pro Lys Phe Met Aia Lys Ala A. a Leu Pne His lie Lys Glu Thr Lys 
165 170 175 

His Val Val Gin Asr Ala Glu Pro Va'. Thr Leu His He Asp leu Val 
13C 135 19C 

Arg Glu 

(2) INFORMATION FOR SE^ ID NO: 36: 

-i) SEC'JENCE CHARACTERISTICS: 

,A i LENGTH: 140 arr.mc acids 
■: b ■ i Y P L : a tu n c, arid 
;C: STRANL ELNESS : unknown 
■:C) TOPCL2G i : unknown 

MCH.C'JLE TYPE: protein 

.vi; ORIGINAL SOURCE: 

A' ORGANISM : DHA53 



80 



WO 98/21339 PCI7US97/20292 



(xi) SEQUENCE DESCRIPTION: SLQ ID NO: 36: 

Met Ser Glu Lys Thr Met Arg Val Gin Asp Tyr Pre Leu Ala Thr Arc 

1 5 10 1: 

Cys Pro Glu His He Leu T:ir Pro Thr Gly Lys Pre Leu Thr Asp He 

20 25 30 

Thr Leu Glu Lys Val Leu Ser Gly Glu Val Gly Pre Gin Asp Val Arc 
j 5 4 0 .15 

He Ser Arg Gin Thr Leu Glu Tyr Gin Ala Gin He Ala Glu Gin Mel 
50 55 60 

Gin His Ala Val Ala Arg Asn Phe Arg Arg Ala Ala Glu Leu He Ala 

65 70 75 80 

lie Pro Asp Glu Arc He Leu Ala lie Tyr Asn Ala Leu Arg Pre Phe 

S5* 90 05 

Arg Ser Ser Gin Ala Glu Leu Leu Ala He Ala Asp Glu Leu Glu His 
100 105 110 

Thi Tro His Ala Thr Val Asr. A. a Ala Phe Val Arc Glu Ser Ala Glu 

115 120 125 

Val Tyr Gin Gin Arc His Lys Leu Arg Lys Gly Ser 
120 135 140 

(2) INFORMATION FOR SEQ ID NO: 37: 

[i] SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 amine acids 
;D) TYPE: amine acid 
;C) STRANCEDNESS: unknown 
;D) TOPOLCGY: unknown 

(li; MOLECULE TYPE: protein 

ivi; ORIGINAL SOURCE: 

(A) ORGANISM: DHAT 

-.xi) SEQUENCE DESCRIPTION: SEC ID NO: 37: 

Me*. Ser Tyr Arg Met Phe Asp Tyr Leu Val Pre Asn Val Asn Phe Phe 
1 ' ' 5 1C 15 

G.y Pro Asn Ala He Ser Val Val Gly Glu Arg Cys Gin Lea Leu Gly 
20 15 " 30 

Gly Lys Lys Ala Leu Leu Val Thr Asn Lys Gly Leu Arg Ala He Lys 
35 4J 45 

Asp Gly Ala Val Asp Lys Thr Leu His Tyr Leu Arg Glu Ala Gly He 

5 0~ 5 5 60 

Glu Val Ala He Phe Asp Gly Val Glu Pro Asr. Pro Lys Asp Thr Asn 

O 0 < *j ■ - ri . 1 

Val Arc Asd G 1 y leu A. a Val Phe Arc Arc G". i: Gin Cys Asp He IH 



,'ai Thr Val Gly Gly Gly Sor Pre His .--.sc ~ V s G*y Lys Gly He 
1 C C H5 11C 
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lie Ala Aia Thr His Giu Gly Asp Leu Tyr 3 in Tyr All Gly II* niu 

115 l/.C 125 

Thr leu Thr Asn Pro Leu Pro Pre lie Vai Aia Val Asn Thr Thr Ala 
13C 135 14C 

Gly Thr Ala Ser Glu Val Thr Arc His Cys Val Leu Thr Asn Thr Glu 
ill 15C 155 16C 

Thr Lys Val Lys Phe Vai He Val Ser Trp Arg Lys L«u Pro Ser Vai 
165 17C 175 

Ser lie Asn Asp Pro Leu Leu Met He Gly Lys Pro Ala Ala Leu Thr 
180 1S5 190 

Ala Ala Tnr G_y Met Asp Ala Leu Thr His Aia Val Glu Ala Tyr lie 
195 200 205 

^er Lys Asp Ala Asn Pro Val Thr Asp Ala Ala Ala Met G.n Ala He 
210 ' 215 220 

Arg Leu He Ala Arg Asn Leu Arg Gin Ala Val Ala Leu Gly Ser Asn 
225 230 235 240 

Leu Gin Aia Arg Glu Asn Met Ala Tyr Ala Ser Leu Leu Ala Gly Met 
245 250 255 

Ala Phe Asn Asn Ala Asn Leu Gly Tvr Val His Ala Met Ala His Gin 
260 265 273 

Leu Gly Gly Leu Tyr Asd Met. Pro His Gly Val Ala Asn Ala Val Leu 
2^5 2G0 285 

Leu Pro His Vai Ala Arg Tyr Asn Leu He Ala Asn Pro Glu Lys Phe 

20D 295 300 

A^a Aso He Ala Glu Leu Met Glv Glu Asn He Thr Gly Leu Ser Thr 
305 310 315 320 

Leu Asp Ala Ala Glu Lys Ala He Ala Ala lie Thr Arg Leu Ser Met 

325 ' 330 335 

Asr He Gly He Pro Gin His Leu Arg Asp Leu Gly Val lys Giu Ala 
34 0 34 5 .3 50 

Asp Fne Pro Tyr Met Ala Glu Met Aia Leu Lys Asp Gly Asn Ala Fne 
355 ' 360 365 

Ser Asn Pro Arg Lys Gly Asn Giu Gin Glu lie Ala Ala He Phe Arq 
2^0 ' 37 5 33 0 

Gin Ala Phe 
385 

;2; INFORMATION FOR SE2 ID NC:3S: 

(:! SEQUENCE CHARAGTKR I G I' I OS : 

(A; LENGTH: 2 7 x;ase pairs 
( E ' TYPE: nuuieu aire 
[Z[ S7RANDEDNESS : single 
;L TOPOI CGV : : : near 

; H ' MOLECULE TYPE: GNA ; genomic' 
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[xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 38: 

GCGAATTCAT GAGCTATCGT ATGTTT3 2 7 

;:) INFORMATION FOR SEQ I C NC : 3 9 : 

(ii SEQUENCE CHARACTERISTICS: 

(A) LENGTH: ."!8 base pairs 
(3) TYFE: nurleic acic 
(CJ STRANDEDNESS: single 
[D] TOPOLOGY: linear 

(ii) MOLECULE TYPE: UNA [genomic) 

(xi) SEQUENCE DESCRIPTION: SEC ID NO: 39: 

GCGAATTCAG AA7GCCT3GC GGAAAATC 2 b 

(2; INFORMATION FOR SEC ID NO: 40: 

11) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 23 base pairs 

(B) TYPE: nuclei- acid 
(Ci STRANDEDNESS: single 
(L) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

GGGAATTCAT GAGCGAGAAA ACCATGC3 28 

(2) INFORMATION FOR SEO VC NO:41: 

!i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2? base pairs 
(F) TYPE : nucleic acid 
iC) STRANDEDNESS: single 
[ D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

ixi) SEQUENCE DE3CFI FT 1 ON : SEQ ID NO: 41: 

GCGAATTCTT ACCTTCCTTT ACCCAGC ?. n 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i! SEQUENCE CHARACTERISTICS: 

■A) LENGTH : 30 r.»ase pairs 

TYPE: nu:leic acid 
;C) S T RAM DE EN ESS : single 
[D) TOPOLOGY: linear 

[ii! MOLECULE TYFE: DNA (genomic) 

ixii SEQUENCE CESCFIPTICN: SEQ ID NO: -12: 

GCGAATTCAT GCAACAGACA AC .'CAAATTC 30 

::; information fcr se: id n::43: 

;;i SEQUENCE CHARACTERISTICS: 

;A; LENGTH: 2; case pairs 
(3' TYFE: nuclei: a::: 
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;C] STRANDEDNESS: single 
:D) TOPOLOGY: linear 

iii; MOLECULE TYPE: DNA (genomic) 

ixi; SEQUENCE DESCRIPTION: SEC IE NC : 4 j : 

GCGAATTCAC TCCCTTACTA ACTCG 25 

2) 1 NFC POTION FOR SEC 1 ID NO: 44: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 oase pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOFOLOGY : linear 

;ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : 3EQ ID NO : 4 4 : 
GGGAATTCAT GAAAAGATCA AAACGATTTG 3C 
:2) INFORMATION FCR SEC ID NO: 45: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GCGAATTCTT ATTCAATGGT GTIGGGCCG 2 9 



(2) INFORMATION FOR SEQ ID NO : 4 6 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 
(3) TYPE: nucleic acid 
C) STRANDEDNESS: sincie 
(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic; 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 46: 

TTGATAATAT AACCAT3GCT GCTGCTGCTG ATA 3 

(2: INFORMATION FOR SEQ ID NO: 47 

[i] SEQUENCE CHARACTERISTICS: 

A) LENGTH: 39 case pairs 
5) TYPE: nucleic acid 
2) STRANDEDNESS: single 
,D) T 0 PC LC G Y : linear 

MOLECULE TYPE: DNA ( ger.cmc i 

SEQUENCE DESCRIPTION: SEQ ID NO: 4": 

TTATCTT3GA TCCAATAAAT CTAATCTTC 



3 4 
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(2; INFORMATION FOR SEC ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: ?A base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic- 

(xi) SEQUENCE DESCRIPTION : SEC ID NO:45: 

CATGACTAGT AAGGAGGACA ATTC 2 4 

(2! INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ONA (genome) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
CATG3AATT G TCCTCCTTAC TAGT 2 4 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCT Rule 136/j) 



A. The indications made below relate to the microorganism referred lo in the description 

page 7 and 8 .lines 37 & 38 on pg. 7 & Lines 1-5 on pg. 8 



on i 



B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet Q 



Name of depositary institution 
AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including postal code and country) 
12301 Parklawn Drive 
Rockville, Maryland 20852 
US 



Dale of deposit 


Accession Number 


26 September 1996 


98188 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet Q 



In respect of those designations in which a European patent is sought, 
a sample of the deposited microorganism will be made available until 
the publication of the mention of the grant of the European patent or 
until the* date on which the application has been refused or withdrawn 
or is deemed to be withdrawn, only by the issue of such a sample to an 
expert nominated by the person requesting the sample, (Rule 28(A) EPC) 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below w.ll be submitted to the International Bureau later (specify the general nature q/ the indicate e * "Accession 
Number of Deposit") 



For receiving Office use only 



| | 1 his sheet was received with the international application 



Authorized officer 



For International Bureau use onl> 



[ 1 This sheet was received b> the International Bureau on. 



Authorized officer 



l orm PCT7RO'134(July 1992) 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRule 136/j) 

A. The indications made below relate io the microorganism referred to in the description 

on page , line s 6 - 12 

B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet 

Name of depositary institution 
AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including postal code and country) 
12301 Parklawn Drive 
Rockville, Maryland 20852 
US 



Dale of deposit 


Accession Number 


26 September 1996 


74392 



C ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet Q 



la respect of those designations in which a European patent is sought, 
a sample of the deposited microorganism will be made available until 
the publication of the mention of the grant of the European patent or 
until the- date on which the application has been refused or withdrawn 
or is deemed to be withdrawn, only by the issue of such a sample to an 
expert nominated by the person requesting the sample. (Rule 28(4) EPC) 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specify the general nature of r the truncations e g , "Accession 
Number of Deposit") 



For receiving Office use only 



P] This sheet was received with the international application 



Authorized officer 



For International Bureau use only 



f | This sheet was received by the International Bureau on: 




FormPCT/RO/l34(July 1992) 
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WHAT IS CLAIMED IS: 

1 . A method for the production of 1 ,3-propanediol from a recombinant 
organism comprising: 

(i) transforming a suitable host organism with a transformation 
5 cassette comprising at least one of 

fa) a gene encoding a glycerol-3-phosphate dehydrogenase 

activity; 

fb) a gene encoding a glycerol-3-phosphatase activity; 
(c) genes encoding a dehydratase activity; 

10 (d) a gene encoding 1,3-propanediol oxidoreductase activity, 

provided that if the transformation cassette comprises less than all the genes of 
(a)-(d), then the suitable host organism comprises endogenous genes whereby 
the resulting transformed host organism comprises at least one of each of genes 
(a)-(d); 

15 (ii) culturing the transformed host organism under suitable 

conditions in the presence of at least one carbon source selected from the group 
consisting of monosaccharides, oligosaccharides, polysaccharides, or a one- 
carbon substrate whereby 1,3-propanediol is produced; and 
(iii) recovering the L 3 -propanediol. 

20 2. The method of Claim 1 wherein the transformation cassette 

comprises all of the genes (a)-(d). 

3. The method of Claim 1 wherein the suitable host organism is 
selected from the group consisting of bacteria, yeast, and filamentous fungi. 

4. The method of Claim 3 wherein the suitable host organism is 
25 selected from the group of genera consisting of Citrobacter, Enterobacter, 

Clostridium, Klebsiella, Aerobacter, Lactobacillus, Aspergillus, Saccharomyces, 

Schizosaccharomyces, Zygosaccharomyces, Pichia, Kluyveromyces, Candida, 

Hansenula, Debaryomyces , Mucor, Torulopsis, Methylobacter, Escherichia, 

Salmonella, Bacillus, Streptomyces and Pseudomonas. 
30 5. The method of Claim 4 wherein the suitable host organism is 

selected from the group consisting of E. coli, Klebsiella spp., and 

Saccharomyces spp. 

6. The method of Claim 1 wherein the transformed host organism is a 

Saccharomyces spp. transformed with a transformation cassette comprising the 
35 genes dhaBl, d)\aB2, dhaB3, and dhaT, wherein the genes are stably integrated 

into the Sacdiar amy ces spp. genome. 
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7. The method of Claim 1 wherein the transformed host organism is a 
Klebsiella spp. transformed with a transformation cassette comprising the genes 
GPD1 and GPD2. 

8. The method of Claim 1 wherein the carbon source is glucose. 

5 9. The method of Claim 1 wherein the gene encoding a glycerol-3- 

phosphate dehydrogenase enzyme is selected from the group consisting of genes 
corresponding to amino acid sequences given in SEQ ID NO: 1 1, in SEQ ID 
NO: 12, and in SEQ ID NO: 13, the amino acid sequences encompassing amino 
acid substitutions, deletions or additions that do not alter the function of the 
10 glycerol-3-phosphate dehydrogenase enzyme. 

10. The method of Claim 1 wherein the gene encoding a glycerols- 
phosphatase enzyme is selected from the group consisting of genes 
corresponding to amino acid sequences given in SEQ ID NO:33 and in SEQ ID 
NO: 17, the amino acid sequences encompassing amino acid substitutions, 

15 deletions or additions that do not alter the function of the glycerol-3-phosphatase 
enzyme. 

11. The method of Claim 1 wherein the gene encoding a glycerol kinase 
enzyme corresponds to an amino acid sequence given in SEQ ID NO: 18, the 
amino acid sequence encompassing amino acid substitutions, deletions or 

20 additions that do not alter the function of the glycerol kinase enzyme. 

12. The method of Claim 1 wherein the genes encoding a dehydratase 
enzyme comprise dhaBl, dhaB2 and dhB3, the genes corresponding respectively 
to amino acid sequences given in SEQ ID NO:34, SEQ ID NO:35, and SEQ ID 
NO:36, the amino acid sequences encompassing amino acid substitutions, 

25 deletions or additions that do not alter the function of the dehydratase enzyme. 

13. The method of Claim 1 wherein the gene encoding a 1 ,3-propanediol 
oxidoreductase enzyme corresponds to an amino acid sequence given in SEQ ID 
NO:37, the amino acid sequence encompassing amino acid substitutions, 
deletions or additions that do not alter the function of the 1,3 -propanediol 

30 oxidoreductase enzyme. 

14. A transformed host cell comprising: 
(a) a group of genes comprising 

(1) a gene encoding a glycerol-3-phosphate dehydrogenase 
enzyme corresponding to the amino acid sequence given in SEQ ID NO: 1 1 : 
35 (2) a gene encoding a glycerol-3-phosphatase enzyme 

corresponding to the amino acid sequence given in SEQ ID NO: 17: 

(3) a gene encoding the a subunit of the glycerol dehydratase 
enzyme corresponding to the amino acid sequence given in SEQ ID NO: 34; 
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(4) a gene encoding the P subunit of the glycerol dehydratase 
enzyme corresponding to the amino acid sequence given in SEQ ID NO:35; 

(5) a gene encoding the y subunit of the glycerol dehydratase 
enzyme corresponding to the amino acid sequence given in SEQ ID NO:36; and 

5 (6) a gene encoding the 1,3-propanediol oxidoreductase enzyme 

corresponding to the amino acid sequence given in SEQ ID NO: 37, 
the respective amino acid sequences of (a)(l)-(6) encompassing amino acid 
substitutions, deletions, or additions that do not alter the function of the enzymes 
of genes (l)-(6), and 

10 (b) a host cell transformed with the group of genes of (a), 

whereby the transformed host cell produces 1,3-propanediol on at least one 
substrate selected from the group consisting of monosaccharides, 
oligosaccharides, and polysaccharides or from a one-carbon substrate. 
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